<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>PostgreSQL案例 on Last DBA</title><link>https://lastdba.com/en/categories/postgresql%E6%A1%88%E4%BE%8B/</link><description>Recent content in PostgreSQL案例 on Last DBA</description><generator>Hugo -- gohugo.io</generator><language>en-US</language><copyright>© 2026 liuzhilong62</copyright><lastBuildDate>Mon, 09 Mar 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://lastdba.com/en/categories/postgresql%E6%A1%88%E4%BE%8B/index.xml" rel="self" type="application/rss+xml"/><item><title>Case Study: Startup Failure and SysV Shared Memory</title><link>https://lastdba.com/en/2026/03/09/case-study-startup-failure-and-sysv-shared-memory/</link><pubDate>Mon, 09 Mar 2026 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2026/03/09/case-study-startup-failure-and-sysv-shared-memory/</guid><description>&lt;h2 class="relative group"&gt;Problem Symptoms
 &lt;div id="problem-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The database instance&amp;rsquo;s RSS memory was maxed out, OOM messages appeared in the logs, and the instance died. We won&amp;rsquo;t analyze the OOM cause here.&lt;/p&gt;
&lt;p&gt;But startup kept failing — 4 or 5 attempts according to the logs:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:15:21 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;578272&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: pre-existing shared memory block &lt;span style="color:#f92672"&gt;(&lt;/span&gt;key 2048, ID 1328250881&lt;span style="color:#f92672"&gt;)&lt;/span&gt; is still in use
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:15:21 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;578272&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Terminate any old server processes associated with data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:15:21 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;578272&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: database system is shut down
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:21:03 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;658824&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: pre-existing shared memory block &lt;span style="color:#f92672"&gt;(&lt;/span&gt;key 2048, ID 1328250881&lt;span style="color:#f92672"&gt;)&lt;/span&gt; is still in use
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:21:03 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;658824&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Terminate any old server processes associated with data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:21:03 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;658824&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: database system is shut down
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:31:12 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;794791&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: redirecting log output to logging collector process
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:31:12 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;794791&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Future log output will appear in directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data/pg_log&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:31:37 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;801049&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: lock file &lt;span style="color:#e6db74"&gt;&amp;#34;postmaster.pid&amp;#34;&lt;/span&gt; already exists
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:31:37 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;801049&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Is another postmaster &lt;span style="color:#f92672"&gt;(&lt;/span&gt;PID 794791&lt;span style="color:#f92672"&gt;)&lt;/span&gt; running in data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;?
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:32:34 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;814396&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: lock file &lt;span style="color:#e6db74"&gt;&amp;#34;postmaster.pid&amp;#34;&lt;/span&gt; already exists
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:32:34 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;814396&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Is another postmaster &lt;span style="color:#f92672"&gt;(&lt;/span&gt;PID 794791&lt;span style="color:#f92672"&gt;)&lt;/span&gt; running in data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;?&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Startup succeeded after the DBA ran &lt;code&gt;ipcrm -m xxx&lt;/code&gt; before starting.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Symptoms
 &lt;div id="problem-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The database instance&amp;rsquo;s RSS memory was maxed out, OOM messages appeared in the logs, and the instance died. We won&amp;rsquo;t analyze the OOM cause here.&lt;/p&gt;
&lt;p&gt;But startup kept failing — 4 or 5 attempts according to the logs:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:15:21 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;578272&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: pre-existing shared memory block &lt;span style="color:#f92672"&gt;(&lt;/span&gt;key 2048, ID 1328250881&lt;span style="color:#f92672"&gt;)&lt;/span&gt; is still in use
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:15:21 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;578272&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Terminate any old server processes associated with data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:15:21 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;578272&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: database system is shut down
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:21:03 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;658824&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: pre-existing shared memory block &lt;span style="color:#f92672"&gt;(&lt;/span&gt;key 2048, ID 1328250881&lt;span style="color:#f92672"&gt;)&lt;/span&gt; is still in use
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:21:03 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;658824&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Terminate any old server processes associated with data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:21:03 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;658824&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: database system is shut down
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:31:12 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;794791&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: redirecting log output to logging collector process
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:31:12 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;794791&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Future log output will appear in directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data/pg_log&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:31:37 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;801049&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: lock file &lt;span style="color:#e6db74"&gt;&amp;#34;postmaster.pid&amp;#34;&lt;/span&gt; already exists
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:31:37 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;801049&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Is another postmaster &lt;span style="color:#f92672"&gt;(&lt;/span&gt;PID 794791&lt;span style="color:#f92672"&gt;)&lt;/span&gt; running in data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;?
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:32:34 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;814396&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: lock file &lt;span style="color:#e6db74"&gt;&amp;#34;postmaster.pid&amp;#34;&lt;/span&gt; already exists
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:32:34 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;814396&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Is another postmaster &lt;span style="color:#f92672"&gt;(&lt;/span&gt;PID 794791&lt;span style="color:#f92672"&gt;)&lt;/span&gt; running in data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;?&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Startup succeeded after the DBA ran &lt;code&gt;ipcrm -m xxx&lt;/code&gt; before starting.&lt;/p&gt;
&lt;p&gt;Although the issue was quickly resolved, many questions remained:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why isn&amp;rsquo;t this scenario more common in practice?&lt;/li&gt;
&lt;li&gt;The start.log shows two different error types — what operations and logic do they correspond to?&lt;/li&gt;
&lt;li&gt;Can shared memory still exist even if the postmaster is gone?&lt;/li&gt;
&lt;li&gt;How do you locate and clean up this shared memory segment?&lt;/li&gt;
&lt;li&gt;PG has multiple shared memory segments — which one is this?&lt;/li&gt;
&lt;li&gt;Besides &lt;code&gt;ipcrm -m&lt;/code&gt;, are there other ways to get the instance started?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Error Analysis: &lt;code&gt;pre-existing shared memory block&lt;/code&gt;
 &lt;div id="error-analysis-pre-existing-shared-memory-block" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#error-analysis-pre-existing-shared-memory-block" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Three Types of Shared Memory
 &lt;div id="three-types-of-shared-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#three-types-of-shared-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Normally, after PG starts, there are three shared memory segments.&lt;/p&gt;
&lt;p&gt;Using the default &lt;code&gt;shared_memory_type='mmap'&lt;/code&gt; without huge pages as an example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## View PG&amp;#39;s actual shared memory usage from its virtual memory map&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/&lt;span style="color:#e6db74"&gt;`&lt;/span&gt;head -1 $PGDATA/postmaster.pid&lt;span style="color:#e6db74"&gt;`&lt;/span&gt;/smaps | grep -E &lt;span style="color:#e6db74"&gt;&amp;#34;\-s&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b61b0563000-2b61b0564000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;116293664&lt;/span&gt; /SYSV00001000 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b61b057f000-2b61b05b3000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:12 &lt;span style="color:#ae81ff"&gt;1501001168&lt;/span&gt; /dev/shm/PostgreSQL.1193490778
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b61bbac2000-2b61fa67a000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;1500999610&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From top to bottom, these are: &lt;strong&gt;the SysV shared memory used at startup&lt;/strong&gt;, &lt;strong&gt;shared memory for parallel queries&lt;/strong&gt;, and &lt;strong&gt;shared memory for shared_buffers&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;If shared_buffers uses huge pages, or if the shared_memory_type is SysV instead of mmap, the output differs slightly.&lt;/p&gt;
&lt;p&gt;Huge pages:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2aaaaac00000-2aba9ca00000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:0e &lt;span style="color:#ae81ff"&gt;48453452&lt;/span&gt; /anon_hugepage &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b08f2eea000-2b08f2eeb000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;50692152&lt;/span&gt; /SYSV00001000 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b08f2f05000-2b08f302d000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:12 &lt;span style="color:#ae81ff"&gt;48436142&lt;/span&gt; /dev/shm/PostgreSQL.1345689218&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;shared_memory_type = &amp;lsquo;sysv&amp;rsquo;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b03b3ceb000-2b03b3d1f000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:12 &lt;span style="color:#ae81ff"&gt;1572332304&lt;/span&gt; /dev/shm/PostgreSQL.2883611352
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b03bf0c2000-2b03fdc7a000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;143917075&lt;/span&gt; /SYSV00001000 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Summary:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;PG Shared Memory Config&lt;/th&gt;
 &lt;th&gt;smaps Segments&lt;/th&gt;
 &lt;th&gt;shared_buffers smaps&lt;/th&gt;
 &lt;th&gt;sysv smaps&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;shared_memory_type=mmap, no huge pages&lt;/td&gt;
 &lt;td&gt;3 segments&lt;/td&gt;
 &lt;td&gt;/dev/zero&lt;/td&gt;
 &lt;td&gt;/SYSV00001000&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;shared_memory_type=sysv, no huge pages&lt;/td&gt;
 &lt;td&gt;2 segments&lt;/td&gt;
 &lt;td&gt;/SYSV00001000&lt;/td&gt;
 &lt;td&gt;/SYSV00001000&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;shared_memory_type=mmap, with huge pages&lt;/td&gt;
 &lt;td&gt;3 segments&lt;/td&gt;
 &lt;td&gt;/anon_hugepage&lt;/td&gt;
 &lt;td&gt;/SYSV00001000&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;shared_memory_type=sysv, with huge pages&lt;/td&gt;
 &lt;td&gt;not supported&lt;/td&gt;
 &lt;td&gt;not supported&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Now the key question: when the error says &lt;code&gt;pre-existing shared memory block&lt;/code&gt;, which shared memory segment is it talking about?&lt;/p&gt;

&lt;h3 class="relative group"&gt;Source Code Analysis
 &lt;div id="source-code-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#source-code-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Searching for the error message in the source quickly leads to the key location: &lt;code&gt;src/backend/port/sysv_shmem.c&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;First, understand what the SysV shmem is for. From scattered README content:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;We still require a SysV shmem block to
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * exist, though, because mmap&amp;#39;d shmem provides no way to find out how
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * many processes are attached, which we need for interlocking purposes.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * As of PostgreSQL 9.3, we normally allocate only a very small amount of
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * System V shared memory, and only for the purposes of providing an
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * interlock to protect the data directory. The real shared memory block
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * is allocated using mmap(). This works around the problem that many
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * systems have very low limits on the amount of System V shared memory
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * that can be allocated. Even a limit of a few megabytes will be enough
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * to run many copies of PostgreSQL without needing to adjust system settings.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;SysV shmem can determine whether shared memory is still attached; mmap cannot&lt;/li&gt;
&lt;li&gt;This &lt;strong&gt;SysV shmem is used to protect the data directory&lt;/strong&gt;; shared_buffers uses mmap (by default), not SysV&lt;/li&gt;
&lt;li&gt;This SysV shmem segment is tiny (from the virtual addresses we can see it&amp;rsquo;s just 4K = 2b61b0563000-2b61b0564000)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now look at the shm state enum:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;enum&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SHMSTATE_ANALYSIS_FAILURE,	&lt;span style="color:#75715e"&gt;/* unexpected failure to analyze the ID */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SHMSTATE_ATTACHED,			&lt;span style="color:#75715e"&gt;/* pertinent to DataDir, has attached PIDs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SHMSTATE_ENOENT,			&lt;span style="color:#75715e"&gt;/* no segment of that ID */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SHMSTATE_FOREIGN,			&lt;span style="color:#75715e"&gt;/* exists, but not pertinent to DataDir */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SHMSTATE_UNATTACHED			&lt;span style="color:#75715e"&gt;/* pertinent to DataDir, no attached PIDs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} IpcMemoryState;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The key states are ATTACHED, FOREIGN, and UNATTACHED.&lt;/p&gt;
&lt;p&gt;The SysV shmem protects the data directory — the common scenario is ensuring the directory isn&amp;rsquo;t running two instances. Since it&amp;rsquo;s shared memory, weird scenarios could mean the segment doesn&amp;rsquo;t belong to this directory or this process (FOREIGN state). If the shared memory corresponds to the data directory but no processes are running, it should be UNATTACHED. With processes running, it&amp;rsquo;s ATTACHED.&lt;/p&gt;
&lt;p&gt;Now look at the error thrown by &lt;code&gt;PGSharedMemoryCreate&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PGShmemHeader &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;PGSharedMemoryCreate&lt;/span&gt;(Size size,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 PGShmemHeader &lt;span style="color:#f92672"&gt;**&lt;/span&gt;shim)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;) &lt;span style="color:#75715e"&gt;// infinite loop
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{..
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; shmid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;shmget&lt;/span&gt;(NextShmemSegID, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(PGShmemHeader), &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);&lt;span style="color:#75715e"&gt;// shmget to fetch the SysV shmem and return its shmid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (shmid &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			oldhdr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; NULL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			state &lt;span style="color:#f92672"&gt;=&lt;/span&gt; SHMSTATE_FOREIGN;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			state &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;PGSharedMemoryAttach&lt;/span&gt;(shmid, NULL, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;oldhdr);&lt;span style="color:#75715e"&gt;// determine this shmem segment&amp;#39;s state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (state)&lt;span style="color:#75715e"&gt;// take different actions based on the shared memory state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...&lt;span style="color:#75715e"&gt;// only showing 2 states here: attached and unattached
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; SHMSTATE_ATTACHED: &lt;span style="color:#75715e"&gt;// shm is attached — throw the error (this is the fault symptom we saw)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(FATAL,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errcode&lt;/span&gt;(ERRCODE_LOCK_FILE_EXISTS),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pre-existing shared memory block (key %lu, ID %lu) is still in use&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								(&lt;span style="color:#66d9ef"&gt;unsigned&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;long&lt;/span&gt;) NextShmemSegID,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								(&lt;span style="color:#66d9ef"&gt;unsigned&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;long&lt;/span&gt;) shmid),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Terminate any old server processes associated with data directory &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;.&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 DataDir)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; SHMSTATE_UNATTACHED:&lt;span style="color:#75715e"&gt;// shm is unattached
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * The segment pertains to DataDir, and every process that had
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * used it has died or detached. Zap it, if possible, and any
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * associated dynamic shared memory segments, as well. This
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * shouldn&amp;#39;t fail, but if it does, assume the segment belongs
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * to someone else after all, and try the next candidate.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * Otherwise, try again to create the segment. That may fail
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * if some other process creates the same shmem key before we
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * do, in which case we&amp;#39;ll try the next key.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;// The segment belongs to the data directory, and no process still holds it
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (oldhdr&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;dsm_control &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;dsm_cleanup_using_control_segment&lt;/span&gt;(oldhdr&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;dsm_control);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;shmctl&lt;/span&gt;(shmid, IPC_RMID, NULL) &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					NextShmemSegID&lt;span style="color:#f92672"&gt;++&lt;/span&gt;; &lt;span style="color:#75715e"&gt;// Note: ShmemSegID increments and retries
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When shmem is ATTACHED, it throws the error. When unattached, it loops infinitely, trying to clean up the segment and incrementing ShmemSegID to request a new one.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The first case corresponds to this fault&lt;/li&gt;
&lt;li&gt;The second case corresponds to normal crash recovery (instance can still start after a crash)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 class="relative group"&gt;SysV shmem
 &lt;div id="sysv-shmem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sysv-shmem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;From PG10 onwards, the postmaster.pid and SysV shmem logic was significantly reworked and has been largely stable since. This article only covers the PG10+ logic.&lt;/p&gt;
&lt;p&gt;pidfile.h:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define LOCK_FILE_LINE_SHMEM_KEY	7&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;sysv_shmem.c, InternalIpcMemoryCreate():&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;		line[&lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;sprintf&lt;/span&gt;(line, &lt;span style="color:#e6db74"&gt;&amp;#34;%9lu %9lu&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(&lt;span style="color:#66d9ef"&gt;unsigned&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;long&lt;/span&gt;) memKey, (&lt;span style="color:#66d9ef"&gt;unsigned&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;long&lt;/span&gt;) shmid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;AddToDataDirLockFile&lt;/span&gt;(LOCK_FILE_LINE_SHMEM_KEY, line);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From the source code, shmem info is saved on line 7 of postmaster.pid, containing the shmkey and shmid.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat postmaster.pid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;242712&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1772698474&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;8531&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/tmp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0.0.0.0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143917078&lt;/span&gt; &lt;span style="color:#75715e"&gt;# &amp;lt;----here&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ready&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;What Are shmkey and shmid?
 &lt;div id="what-are-shmkey-and-shmid" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-are-shmkey-and-shmid" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;In PG&amp;rsquo;s source, the call path is: InternalIpcMemoryCreate():&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			shmid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;shmget&lt;/span&gt;(memKey, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, IPC_CREAT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IPC_EXCL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IPCProtection);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;PG uses shmkey/memkey as a seed key to request shared memory from the kernel, which returns a unique identifier, shmid.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;shmid is highly dependent on the server or rather the server&amp;rsquo;s memory state. For PG, when quickly restarting an instance, the shmid may be the same or +1 — this depends on Linux kernel internals. After a full server reboot, it&amp;rsquo;ll be completely different.&lt;/p&gt;
&lt;p&gt;To aid understanding: &lt;strong&gt;whether the server reboots or not, shmkey/memkey can remain constant (since it&amp;rsquo;s user/PG input). But across a server reboot, even with the same shmkey, the returned shmid is very unlikely to be the same value.&lt;/strong&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;How PG Obtains the shmkey
 &lt;div id="how-pg-obtains-the-shmkey" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-pg-obtains-the-shmkey" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;PGSharedMemoryCreate():&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * We use the data directory&amp;#39;s ID info (inode and device numbers) to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * positively identify shmem segments associated with this data dir, and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * also as seeds for searching for a free shmem key.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;stat&lt;/span&gt;(DataDir, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;statbuf) &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(FATAL,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(&lt;span style="color:#a6e22e"&gt;errcode_for_file_access&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not stat data directory &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;: %m&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						DataDir)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Loop till we find a free IPC key. Trust CreateDataDirLockFile() to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * ensure no more than one postmaster per data directory can enter this
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * loop simultaneously. (CreateDataDirLockFile() does not entirely ensure
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * that, but prefer fixing it over coping here.)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	NextShmemSegID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; statbuf.st_ino;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		IpcMemoryId shmid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		PGShmemHeader &lt;span style="color:#f92672"&gt;*&lt;/span&gt;oldhdr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		IpcMemoryState state;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Try to create new segment */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		memAddress &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;InternalIpcMemoryCreate&lt;/span&gt;(NextShmemSegID, sysvsize);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (memAddress)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;				&lt;span style="color:#75715e"&gt;/* successful create and attach */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Check shared memory and possibly remove and recreate */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * shmget() failure is typically EACCES, hence SHMSTATE_FOREIGN.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * ENOENT, a narrow possibility, implies SHMSTATE_ENOENT, but one can
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * safely treat SHMSTATE_ENOENT like SHMSTATE_FOREIGN.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		shmid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;shmget&lt;/span&gt;(NextShmemSegID, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(PGShmemHeader), &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;PG calls &lt;code&gt;stat()&lt;/code&gt; on the data directory, which returns the directory&amp;rsquo;s inode. PG directly uses &lt;code&gt;datadir.inode&lt;/code&gt; as the shmkey.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;In PG, the shmem key is tightly coupled to the data directory&amp;rsquo;s inode. Under normal circumstances, shmem key = datadir inode.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Verification example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ls -id $PGDATA
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; /lzlcloud/pg8574/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat postmaster.pid |head -7|tail -1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143917090&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;We can see datadir.inode = shmkey = 4096.&lt;/p&gt;

&lt;h4 class="relative group"&gt;PG shmkey in Cloud Environments
 &lt;div id="pg-shmkey-in-cloud-environments" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg-shmkey-in-cloud-environments" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Above I said generally shmkey = datadir.inode, but in cloud environments this is typically not the case.&lt;/p&gt;
&lt;p&gt;Our cloud environment:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ls -id /lzlcloud/pg8298/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; /lzlcloud/pg8298/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ls -id /lzlcloud/pg8388/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; /lzlcloud/pg8388/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ls -id /lzlcloud/pg8095/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; /lzlcloud/pg8095/data&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat /lzlcloud/pg8298/data/postmaster.pid|head -7|tail -1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;971833391&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat /lzlcloud/pg8388/data/postmaster.pid|head -7|tail -1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4097&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;62128161&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat /lzlcloud/pg8095/data/postmaster.pid|head -7|tail -1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4098&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143163441&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The data disk directories all have inode 4096, but the shmkeys are 4096, 4097, 4098.&lt;/p&gt;
&lt;p&gt;Why?&lt;/p&gt;
&lt;p&gt;The inode issue relates to the filesystem:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each filesystem has independent inodes&lt;/li&gt;
&lt;li&gt;The filesystem reserves some inodes — the first few are unusable. Depending on mount options, our data disk&amp;rsquo;s real inodes start at 4096&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So &lt;code&gt;datadir.inode = 4096&lt;/code&gt; is the default behavior of our cloud environment&amp;rsquo;s disk mounts. Other environments may differ — I haven&amp;rsquo;t analyzed those deeply. But with the same filesystem and mount approach for PG data directories, inode collisions are still possible.&lt;/p&gt;
&lt;p&gt;The shmkey issue relates to PG&amp;rsquo;s source code, PGSharedMemoryCreate():&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; NextShmemSegID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; statbuf.st_ino;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		shmid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;shmget&lt;/span&gt;(NextShmemSegID, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(PGShmemHeader), &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (state)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; SHMSTATE_FOREIGN:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				NextShmemSegID&lt;span style="color:#f92672"&gt;++&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The initial shmkey = datadir.inode, but since the requested shmem might be FOREIGN (used by another process), PG increments shmkey by 1 and tries again.&lt;/p&gt;
&lt;p&gt;For example, the instance with shmkey=4097 in postmaster.pid: at startup it tried shmkey=4096, but found that shmid&amp;rsquo;s memory segment was already in use by another instance (the one with shmkey=4096). So it used shmkey+1 to request a different shmid segment.&lt;/p&gt;
&lt;p&gt;Similarly, the instance with shmkey=4098 had to increment twice to find a free shmkey-shmid pair.&lt;/p&gt;

&lt;h3 class="relative group"&gt;shmid Relationships
 &lt;div id="shmid-relationships" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shmid-relationships" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The SysV shmid can be found in &lt;strong&gt;the startup error log&lt;/strong&gt;, &lt;strong&gt;line 7 of postmaster.pid&lt;/strong&gt;, and &lt;strong&gt;virtual memory smaps&lt;/strong&gt;. It can be inspected via the &lt;code&gt;ipcs&lt;/code&gt; command and cleaned up with &lt;code&gt;ipcrm&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Example — note shmid=143917078 throughout:&lt;/p&gt;
&lt;p&gt;Startup error log:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: another server might be running; trying to start server anyway
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to start....2026-03-05 16:02:19 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;262388&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: pre-existing shared memory block &lt;span style="color:#f92672"&gt;(&lt;/span&gt;key 4096, ID 143917078&lt;span style="color:#f92672"&gt;)&lt;/span&gt; is still in use&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;postmaster.pid line 7:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat postmaster.pid |head -7|tail -1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143917078&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Virtual memory smaps:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/&lt;span style="color:#e6db74"&gt;`&lt;/span&gt;head -1 $PGDATA/postmaster.pid&lt;span style="color:#e6db74"&gt;`&lt;/span&gt;/smaps | grep -E &lt;span style="color:#e6db74"&gt;&amp;#34;\-s&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2ad2b5189000-2ad2b518a000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;143917078&lt;/span&gt; /SYSV00001000 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Inspecting and cleaning via SysV shmid:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917078&lt;/span&gt; &lt;span style="color:#75715e"&gt;# cleanup: ipcrm -m shmid&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared memory Segment shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;143917078&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; gid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cuid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cgid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; access_perms&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bytes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; lpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;242712&lt;/span&gt; cpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;242712&lt;/span&gt; nattch&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;att_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 16:14:51 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;det_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 16:14:49 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;change_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 16:14:34 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Testing
 &lt;div id="testing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#testing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;Reproducing the Production Issue
 &lt;div id="reproducing-the-production-issue" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reproducing-the-production-issue" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Hold a backend process alive indefinitely, then &lt;code&gt;kill -9&lt;/code&gt; the postmaster:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat postmaster.pid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143917076&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917076&lt;/span&gt; &lt;span style="color:#75715e"&gt;# shmem id&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared memory Segment shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;143917076&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; gid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cuid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cgid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; access_perms&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bytes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; lpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;241567&lt;/span&gt; cpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;64757&lt;/span&gt; nattch&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; kill -stop &lt;span style="color:#ae81ff"&gt;107648&lt;/span&gt; &lt;span style="color:#75715e"&gt;# any backend&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; kill -9 &lt;span style="color:#ae81ff"&gt;64757&lt;/span&gt; &lt;span style="color:#75715e"&gt;# postmaster or another process&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917076&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared memory Segment shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;143917076&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; gid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cuid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cgid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; access_perms&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bytes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; lpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;252283&lt;/span&gt; cpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;64757&lt;/span&gt; nattch&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#75715e"&gt;# nattch != 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; pg_ctl start -D $PGDATA
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: another server might be running; trying to start server anyway
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to start....2026-03-05 16:02:19 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;262388&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: pre-existing shared memory block &lt;span style="color:#f92672"&gt;(&lt;/span&gt;key 4096, ID 143917076&lt;span style="color:#f92672"&gt;)&lt;/span&gt; is still in use
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-05 16:02:19 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;262388&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Terminate any old server processes associated with data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; stopped waiting
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: could not start server&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;nattch=1 — the instance cannot start.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Normal Crash Recovery (Successful Startup)
 &lt;div id="normal-crash-recovery-successful-startup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#normal-crash-recovery-successful-startup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Essentially, kill the instance and then start it:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat postmaster.pid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt; &lt;span style="color:#75715e"&gt;# shmem id&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared memory Segment shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; gid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cuid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cgid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; access_perms&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bytes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; lpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;154800&lt;/span&gt; cpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;134329&lt;/span&gt; nattch&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; kill -9 &lt;span style="color:#ae81ff"&gt;134329&lt;/span&gt; &lt;span style="color:#75715e"&gt;# postmaster or another process&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat postmaster.pid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt; &lt;span style="color:#75715e"&gt;# shmem id unchanged, segment still exists&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared memory Segment shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; gid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cuid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cgid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; access_perms&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bytes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; lpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;169360&lt;/span&gt; cpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;134329&lt;/span&gt; nattch&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#75715e"&gt;# nattch=0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt; &lt;span style="color:#75715e"&gt;# shmem id unchanged, segment still exists&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; pg_ctl start -D $PGDATA &lt;span style="color:#75715e"&gt;# startup succeeds&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: another server might be running; trying to start server anyway
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to start....2026-03-05 16:14:34 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;242712&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: redirecting log output to logging collector process
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-05 16:14:34 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;242712&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Future log output will appear in directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data/pg_log&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server started
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt; &lt;span style="color:#75715e"&gt;# residual shmem cleaned up during startup&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ipcs: id &lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt; not found
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917078&lt;/span&gt; &lt;span style="color:#75715e"&gt;# shmid incremented by 1 at startup&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared memory Segment shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;143917078&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; gid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cuid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cgid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; access_perms&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bytes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; lpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;273571&lt;/span&gt; cpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;242712&lt;/span&gt; nattch&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;26&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat postmaster.pid &lt;span style="color:#75715e"&gt;# shmkey unchanged, shmid +1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143917078&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;A normal &lt;code&gt;kill -9&lt;/code&gt; followed by startup works fine — the residual shmem is cleaned up during startup. shmkey stays the same because inode=4096 and shmkey=4096 wasn&amp;rsquo;t occupied. shmid+1 is Linux kernel behavior, at least indicating a different shared memory segment was used.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Holding a File Descriptor But Not shmem
 &lt;div id="holding-a-file-descriptor-but-not-shmem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#holding-a-file-descriptor-but-not-shmem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Since startup is tied to the data directory inode, and inode is tied to shmem id, startup essentially &lt;strong&gt;checks whether the shmem is held by another process, not whether a file descriptor is still open&lt;/strong&gt;. So let&amp;rsquo;s test with the logger process, which holds file descriptors but not shared memory:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/77300/smaps | grep -E &lt;span style="color:#e6db74"&gt;&amp;#34;\-s&amp;#34;&lt;/span&gt; &lt;span style="color:#75715e"&gt;# logger process — verify it has no shared memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ kill -stop &lt;span style="color:#ae81ff"&gt;77300&lt;/span&gt; &lt;span style="color:#75715e"&gt;# stop logger&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ kill -9 &lt;span style="color:#ae81ff"&gt;77076&lt;/span&gt; &lt;span style="color:#75715e"&gt;# kill -9 pm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat postmaster.pid &lt;span style="color:#75715e"&gt;# file still exists&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;77076&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/lzlcloud/pg8531/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1772700343&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;8531&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/tmp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0.0.0.0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143917080&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ready
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917080&lt;/span&gt; &lt;span style="color:#75715e"&gt;# shared memory still exists&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared memory Segment shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;143917080&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; gid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cuid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cgid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; access_perms&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bytes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; lpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;77319&lt;/span&gt; cpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;77076&lt;/span&gt; nattch&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;att_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 17:27:11 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;det_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 17:27:15 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;change_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 16:45:43 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ ps -ef|grep &lt;span style="color:#ae81ff"&gt;77300&lt;/span&gt; &lt;span style="color:#75715e"&gt;# process still alive&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;77300&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 16:45 ? 00:00:00 postgresql: lzldb: logger
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;135246&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;46622&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 17:27 pts/1 00:00:00 grep --color&lt;span style="color:#f92672"&gt;=&lt;/span&gt;auto &lt;span style="color:#ae81ff"&gt;77300&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pg_ctl start -D $PGDATA &lt;span style="color:#75715e"&gt;# startup succeeds&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: another server might be running; trying to start server anyway
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to start....2026-03-05 17:27:55 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;140497&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: redirecting log output to logging collector process
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-05 17:27:55 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;140497&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Future log output will appear in directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data/pg_log&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server started&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The logger holds files in the data directory but is not associated with shared memory — it does not block startup.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Deleting postmaster.pid Then Failing to Start
 &lt;div id="deleting-postmasterpid-then-failing-to-start" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#deleting-postmasterpid-then-failing-to-start" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Same procedure: hold a backend process, &lt;code&gt;kill -9&lt;/code&gt; the PM, delete postmaster.pid, attempt startup.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll skip the full output — result: startup fails with:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to start....2026-03-06 15:29:48 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;22475&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: pre-existing shared memory block &lt;span style="color:#f92672"&gt;(&lt;/span&gt;key 4098, ID 171868173&lt;span style="color:#f92672"&gt;)&lt;/span&gt; is still in use
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-06 15:29:48 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;22475&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Terminate any old server processes associated with data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-06 15:29:48 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;22475&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: database system is shut down&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This shows: even with a zombie process holding shmem, deleting the postmaster.pid (which contains the shmid) doesn&amp;rsquo;t stop PG from finding the corresponding shmid.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Stop a Different Instance, Start the Current One
 &lt;div id="stop-a-different-instance-start-the-current-one" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#stop-a-different-instance-start-the-current-one" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;PG analyzes shmid from two sources to determine if it belongs to the current instance:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The shmid corresponding to &lt;code&gt;datadir.inode&lt;/code&gt; as shmkey, or after &lt;code&gt;shmkey++&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;The shmid stored in postmaster.pid&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Even if postmaster.pid is deleted, PG can still tell whether shmem is held by another process. But we can exploit datadir.inode and &lt;code&gt;shmkey++&lt;/code&gt; behavior to get it started.&lt;/p&gt;
&lt;p&gt;Since in our cloud environment all data directory inodes are 4096, and shmkeys differ due to the &lt;code&gt;shmkey++&lt;/code&gt; source logic, we can: &lt;strong&gt;start or stop a PG instance whose datadir.inode = 4096 to shift the current instance&amp;rsquo;s &lt;code&gt;shmkey++&lt;/code&gt; by one, obtaining a different shmid.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ kill -stop &lt;span style="color:#ae81ff"&gt;165245&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ kill -9 &lt;span style="color:#ae81ff"&gt;164411&lt;/span&gt; &lt;span style="color:#75715e"&gt;# stop current instance, keep one of its backend processes alive&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pg_ctl stop -D /pg8531/data &lt;span style="color:#75715e"&gt;# stop a different instance&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to shut down.... &lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server stopped
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pg_ctl start -D /pg8574/data &lt;span style="color:#75715e"&gt;# try starting the current instance — fails because postmaster.pid still exists&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: another server might be running; trying to start server anyway
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to start....2026-03-05 18:22:35 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;196209&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: pre-existing shared memory block &lt;span style="color:#f92672"&gt;(&lt;/span&gt;key 4097, ID 143917087&lt;span style="color:#f92672"&gt;)&lt;/span&gt; is still in use
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-05 18:22:35 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;196209&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Terminate any old server processes associated with data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/pg8574/data&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; stopped waiting
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: could not start server
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Examine the log output.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ mv /lzlcloud/pg8574/data/postmaster.pid&lt;span style="color:#f92672"&gt;{&lt;/span&gt;,.bak&lt;span style="color:#f92672"&gt;}&lt;/span&gt; &lt;span style="color:#75715e"&gt;# delete current instance&amp;#39;s postmaster.pid&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pg_ctl start -D /lzlcloud/pg8574/data &lt;span style="color:#75715e"&gt;# try again — succeeds&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-05 18:23:09 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;207725&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: redirecting log output to logging collector process
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-05 18:23:09 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;207725&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Future log output will appear in directory &lt;span style="color:#e6db74"&gt;&amp;#34;/lzlcloud/pg8574/data/pg_log&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server started
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917087&lt;/span&gt; &lt;span style="color:#75715e"&gt;# the shmid&amp;#39;s SysV segment is still held by our zombie process&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared memory Segment shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;143917087&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; gid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cuid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cgid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; access_perms&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bytes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; lpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;196209&lt;/span&gt; cpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;164411&lt;/span&gt; nattch&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;att_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 18:22:35 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;det_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 18:22:35 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;change_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 18:21:04 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Startup succeeds — the current instance requested a different shared memory segment. The old segment wasn&amp;rsquo;t cleaned up. This is the &amp;ldquo;hack&amp;rdquo; of stopping another instance to start the current one in a cloud environment.&lt;/p&gt;
&lt;p&gt;A small prerequisite: the other instance must have not only inode = current instance inode, but also shmkey &amp;lt; current instance shmkey.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Error Analysis: &lt;code&gt;lock file &amp;quot;postmaster.pid&amp;quot; already exists&lt;/code&gt;
 &lt;div id="error-analysis-lock-file-postmasterpid-already-exists" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#error-analysis-lock-file-postmasterpid-already-exists" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;This problem is much simpler than the shared memory one.&lt;/p&gt;
&lt;p&gt;During startup, PG checks the lock file and its contained PID, in CreateLockFile():&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (other_pid &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; my_pid &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; other_pid &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; my_p_pid &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			other_pid &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; my_gp_pid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;kill&lt;/span&gt;(other_pid, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(errno &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; ESRCH &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; errno &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; EPERM))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* lockfile belongs to a live process */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(FATAL,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errcode&lt;/span&gt;(ERRCODE_LOCK_FILE_EXISTS),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;lock file &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt; already exists&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								filename),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 isDDLock &lt;span style="color:#f92672"&gt;?&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 (encoded_pid &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;?&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Is another postgres (PID %d) running in data directory &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;?&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 (&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;) other_pid, refName) &lt;span style="color:#f92672"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Is another postmaster (PID %d) running in data directory &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;?&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 (&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;) other_pid, refName)) &lt;span style="color:#f92672"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 (encoded_pid &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;?&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Is another postgres (PID %d) using socket file &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;?&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 (&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;) other_pid, refName) &lt;span style="color:#f92672"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Is another postmaster (PID %d) using socket file &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;?&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 (&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;) other_pid, refName))));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Testing is even simpler — just start it a second time while already running:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pg_ctl start -D /pg8531/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: another server might be running; trying to start server anyway
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to start....2026-03-06 15:59:05 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;89145&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: lock file &lt;span style="color:#e6db74"&gt;&amp;#34;postmaster.pid&amp;#34;&lt;/span&gt; already exists
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-06 15:59:05 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;89145&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Is another postmaster &lt;span style="color:#f92672"&gt;(&lt;/span&gt;PID 255500&lt;span style="color:#f92672"&gt;)&lt;/span&gt; running in data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/pg8531/data&amp;#34;&lt;/span&gt;?
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; stopped waiting
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: could not start server
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Examine the log output.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So the later errors in the fault&amp;rsquo;s start.log were because the instance was already running and someone tried starting it multiple more times.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;When starting, PG first allocates a SysV shmem segment (not the mmap-based shared_buffers) to lock the data directory. The lock is obtained by using the data directory&amp;rsquo;s inode as the shmkey passed to &lt;code&gt;shmget()&lt;/code&gt;, which returns a unique shmid. Since the requested shmem may already be in use by another process, PG increments &lt;code&gt;shmkey++&lt;/code&gt; in an infinite loop until it finds an unclaimed segment. postmaster.pid line 7 stores both the shmkey and shmid. In cloud environments, you&amp;rsquo;ll often see adjacent PG instances with incrementing shmkeys — this happens because the data disks are mounted identically and share the same starting inode, causing &lt;code&gt;shmkey++&lt;/code&gt; to kick in.&lt;/p&gt;
&lt;p&gt;If a PG instance is killed unexpectedly, the shmem is not automatically cleaned up. Under normal conditions, no zombie process holds the shared memory, so startup cleans it up and proceeds normally. Under abnormal conditions, a zombie process still holds the shared memory — startup fails and manual intervention is required.&lt;/p&gt;
&lt;p&gt;Recommended handling:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;ipcrm -m&lt;/code&gt; (most recommended)&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;lsof&lt;/code&gt; to find the zombie process and kill it&lt;/li&gt;
&lt;li&gt;Reboot the host&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Not recommended but possible workarounds:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;mv postmaster.pid&lt;/code&gt; + stop a different PG instance (where the other instance&amp;rsquo;s shmkey &amp;lt; current instance&amp;rsquo;s shmkey)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;mv postmaster.pid&lt;/code&gt; + remount the data disk to change its inode&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Finally, answering the opening questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why isn&amp;rsquo;t this scenario more common in practice?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Abnormal instance crash + zombie processes still alive. Many crash scenarios leave no zombie processes, so startup just works.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The start.log shows two different error types — what do they correspond to?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &amp;ldquo;shared memory in use&amp;rdquo; error means abnormal crash + zombie processes still exist. The &amp;ldquo;postmaster.pid already exists&amp;rdquo; error means the instance was started multiple times.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Can shared memory still exist if the postmaster is gone?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Yes, shared memory can persist when the postmaster is gone — PG processes don&amp;rsquo;t always cleanly exit or get cleaned up by the OS. However, if &lt;em&gt;all&lt;/em&gt; processes are gone, the shared memory should not exist.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How do you locate and clean up this shared memory segment?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The shmid can be found in the startup error log (start.log). Clean it with &lt;code&gt;ipcrm -m $shmid&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PG has multiple shared memory segments — which one is this?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The SysV shmem used to protect the data directory. It always exists. See the &amp;ldquo;Three Types of Shared Memory&amp;rdquo; section. It&amp;rsquo;s distinct from the mmap-based shared_buffers.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Can you find the corresponding shmem via inode or file?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Linux does not provide a userspace interface to find SysV shmem by inode or file (this statement is 100% AI-generated, cross-validated across multiple models). PG uses the data directory&amp;rsquo;s inode as a seed shmkey to request shared memory — it does not directly find shmem by inode. PG has its own mechanism for locating SysV shmem, but it&amp;rsquo;s not an absolute mapping; &lt;code&gt;shmkey++&lt;/code&gt; is a compromise startup logic for this reason.&lt;/p&gt;</content:encoded></item><item><title>PostgreSQL Operations Experience 2025</title><link>https://lastdba.com/en/2026/01/11/postgresql-operations-experience-2025/</link><pubDate>Sun, 11 Jan 2026 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2026/01/11/postgresql-operations-experience-2025/</guid><description>&lt;p&gt;This is a technical operations summary, focused on being accessible and practical. It also serves as a periodic reflection on PostgreSQL database operations. Hope it helps fellow PGers.&lt;/p&gt;
&lt;p&gt;Previous ops experience: &lt;a href="https://www.modb.pro/db/1876933230968975360" target="_blank" rel="noreferrer"&gt;PostgreSQL Operations Experience 2024&lt;/a&gt;. Note: this article does not repeat content from that one.&lt;/p&gt;

&lt;h2 class="relative group"&gt;CPU
 &lt;div id="cpu" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cpu" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;SQL performance problems are the most common root cause in PostgreSQL incident handling. This includes poor SQL performance, suboptimal indexing, sudden high concurrency, and execution plan regressions. For a database like PostgreSQL that lacks a robust plan-binding mechanism, having a DBA team to help design data models, access patterns, indexes, and tune execution plans is crucial — it can significantly reduce sudden CPU saturation incidents.&lt;/p&gt;</description><content:encoded>&lt;p&gt;This is a technical operations summary, focused on being accessible and practical. It also serves as a periodic reflection on PostgreSQL database operations. Hope it helps fellow PGers.&lt;/p&gt;
&lt;p&gt;Previous ops experience: &lt;a href="https://www.modb.pro/db/1876933230968975360" target="_blank" rel="noreferrer"&gt;PostgreSQL Operations Experience 2024&lt;/a&gt;. Note: this article does not repeat content from that one.&lt;/p&gt;

&lt;h2 class="relative group"&gt;CPU
 &lt;div id="cpu" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cpu" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;SQL performance problems are the most common root cause in PostgreSQL incident handling. This includes poor SQL performance, suboptimal indexing, sudden high concurrency, and execution plan regressions. For a database like PostgreSQL that lacks a robust plan-binding mechanism, having a DBA team to help design data models, access patterns, indexes, and tune execution plans is crucial — it can significantly reduce sudden CPU saturation incidents.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Execution Plans
 &lt;div id="execution-plans" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#execution-plans" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Execution plan instability is an age-old problem with cost-based optimizers, and PostgreSQL is no exception.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Inaccurate DISTINCT Estimates
 &lt;div id="inaccurate-distinct-estimates" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#inaccurate-distinct-estimates" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/1976119963471589376" target="_blank" rel="noreferrer"&gt;Case Study: From Inaccurate DISTINCT to DISTINCT Calculation Principles&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The default maximum sample size is 30,000 rows. For tables exceeding this size, the estimated distinct count is likely to be low. Note: this assumes the data doesn&amp;rsquo;t have too many unique values.&lt;/p&gt;
&lt;p&gt;Testing on a table with different sample sizes:&lt;/p&gt;
&lt;p&gt;Table: &lt;code&gt;reltuples&lt;/code&gt;=800 million, &lt;code&gt;relpages&lt;/code&gt;=20 million, size=175GB, actual distinct on the target column: 100 million.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;target statistics&lt;/th&gt;
 &lt;th&gt;pages sampling rate&lt;/th&gt;
 &lt;th&gt;tuples sampling rate&lt;/th&gt;
 &lt;th&gt;n_distinct&lt;/th&gt;
 &lt;th&gt;execution time&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;50&lt;/td&gt;
 &lt;td&gt;0.00075&lt;/td&gt;
 &lt;td&gt;0.00001875&lt;/td&gt;
 &lt;td&gt;60K&lt;/td&gt;
 &lt;td&gt;2s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;100&lt;/td&gt;
 &lt;td&gt;0.0015&lt;/td&gt;
 &lt;td&gt;0.0000375&lt;/td&gt;
 &lt;td&gt;110K&lt;/td&gt;
 &lt;td&gt;5s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;1000&lt;/td&gt;
 &lt;td&gt;0.015&lt;/td&gt;
 &lt;td&gt;0.000375&lt;/td&gt;
 &lt;td&gt;1.03M&lt;/td&gt;
 &lt;td&gt;58s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3000&lt;/td&gt;
 &lt;td&gt;0.045&lt;/td&gt;
 &lt;td&gt;0.001125&lt;/td&gt;
 &lt;td&gt;2.68M&lt;/td&gt;
 &lt;td&gt;3m01s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;10000&lt;/td&gt;
 &lt;td&gt;0.15&lt;/td&gt;
 &lt;td&gt;0.00375&lt;/td&gt;
 &lt;td&gt;6.75M&lt;/td&gt;
 &lt;td&gt;7m21s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;(&lt;code&gt;target statistics&lt;/code&gt; max value: 10000)&lt;/p&gt;
&lt;p&gt;Rough summary: n_distinct and analyze execution time grow proportionally with sample size.&lt;/p&gt;
&lt;p&gt;n_distinct increases with sample size, while pages and tuples estimates remain consistently accurate.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Generic Plan Interference
 &lt;div id="generic-plan-interference" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#generic-plan-interference" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;PostgreSQL execution plans must account for generic plans. A generic plan is parameter-independent — it uses default values to compute cost, then compares against the first five custom plan costs; whichever is cheaper wins.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/1964312913808732160" target="_blank" rel="noreferrer"&gt;Case Study: Adding an Index Causes Performance Degradation and Generic Plans&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I. Classification of generic plan estimation problems&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Because of the 5-execution comparison mechanism, generic plan problems fall into two categories:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The first 5 SQL executions are not representative. Heavily dependent on data skew and whether the first 5 parameter values are representative.&lt;/li&gt;
&lt;li&gt;The generic plan itself is flawed. Due to data skew or inability to accurately compute selectivity even with balanced data, the generic plan is inherently inefficient.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;II. Solution reference&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Generic plan problems often surface on partitioned tables. When the partition key is continuous, scanning all partitions should yield a selectivity of 1, but the generic plan estimates 0.05 — likely resulting in a &amp;ldquo;full index scan&amp;rdquo; scenario.&lt;/p&gt;
&lt;p&gt;Consider these when optimizing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Don&amp;rsquo;t create too many indexes that confuse the optimizer&lt;/li&gt;
&lt;li&gt;Eliminate generic plan interference. Execute the prepared statement 6 times for real&lt;/li&gt;
&lt;li&gt;Compare plans with session-level &lt;code&gt;set plan_cache_mode='force_generic_plan';&lt;/code&gt; or &lt;code&gt;set plan_cache_mode='force_custom_plan';&lt;/code&gt;; or on PG 16+, use &lt;code&gt;explain (GENERIC_PLAN)&lt;/code&gt; to compare&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Syntax reference:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--prepare/execute
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PREPARE sql1(text) AS
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SELECT COUNT(*) FROM LZL where a=$1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXECUTE sql1(&amp;#39;zzz&amp;#39;); --run 6 times first
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXPLAIN EXECUTE sql1(&amp;#39;zzz&amp;#39;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;select * from pg_prepared_statements --view prepared statement info, current session only
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--Compare execution plans, set session parameter then EXPLAIN EXECUTE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;set plan_cache_mode=&amp;#39;force_generic_plan&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;set plan_cache_mode=&amp;#39;force_custom_plan&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--Directly view generic plan, 16+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;explain (GENERIC_PLAN) xx &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;LWLock:Lockmanager Caused by Row Locks
 &lt;div id="lwlocklockmanager-caused-by-row-locks" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lwlocklockmanager-caused-by-row-locks" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;LWLock Lockmanager&lt;/code&gt; issues typically occur on partitioned tables under high concurrency with queries lacking partition keys. This year, a new scenario was discovered: &lt;a href="https://www.modb.pro/db/1995089823380627456" target="_blank" rel="noreferrer"&gt;Row Locks Causing LWLock:Lockmanager&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This isn&amp;rsquo;t a major issue — blocking on concurrent updates to the same row is well known. I just hadn&amp;rsquo;t expected that updating the same row could also produce &lt;code&gt;LWLock:Lockmanager&lt;/code&gt;. Not a particularly valuable case study, but when you see &lt;code&gt;LWLock:Lockmanager&lt;/code&gt; as a wait event, consider row locks.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Idle Connections
 &lt;div id="idle-connections" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#idle-connections" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL performance generally improves with each major release. PG 14 made &lt;a href="https://liuzhilong.blog.csdn.net/article/details/130783036" target="_blank" rel="noreferrer"&gt;significant optimizations&lt;/a&gt; to snapshot acquisition and backend transaction tracking, yielding noticeable improvements for high idle connection counts:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/88df744da257.jpg" alt="performance-impact-of-idle-connections-48active-prepost.png" /&gt; (&lt;a href="https://techcommunity.microsoft.com/blog/adforpostgresql/improving-postgres-connection-scalability-snapshots/1806462" target="_blank" rel="noreferrer"&gt;https://techcommunity.microsoft.com/blog/adforpostgresql/improving-postgres-connection-scalability-snapshots/1806462&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;However, this doesn&amp;rsquo;t mean you can ignore idle connections after PG 14. They still consume backend transaction maintenance overhead, cause context switches, fragment memory, etc. — the more idle connections, the worse the performance.&lt;/p&gt;
&lt;p&gt;Typically, application connections have keepalive and pooling. Maintaining some idle connections avoids creating new connections for every request, which would be far more expensive. Small databases generally don&amp;rsquo;t need to worry much about connection counts (as long as they&amp;rsquo;re not absurd) — CPUs are cheap, the system isn&amp;rsquo;t critical, and scaling is easy. But large databases are different. CPU count is the hard limit; you can&amp;rsquo;t just add more. Large databases already have many idle connections; adding more doesn&amp;rsquo;t necessarily increase throughput — when CPU is already tight, it can backfire.&lt;/p&gt;
&lt;p&gt;PG 15 benchmark experience: with 5K idle as baseline, increasing to 10K idle adds ~2-5 vCPU overhead for idle maintenance; 20K idle adds ~5-10 vCPU. Approximate.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Idle in Transaction
 &lt;div id="idle-in-transaction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#idle-in-transaction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Last year I thoroughly criticized long transactions, because they impact PostgreSQL more severely than other databases (Oracle, MySQL, etc.). But this is manageable — with proper alerting and operations, long transactions are solvable.&lt;/p&gt;
&lt;p&gt;When monitoring session states, you need to check them. &lt;code&gt;active&lt;/code&gt; means running SQL, &lt;code&gt;idle in transaction&lt;/code&gt; means in a transaction but not currently executing SQL. All &lt;a href="https://www.postgresql.org/docs/18/monitoring-stats.html#MONITORING-PG-STAT-ACTIVITY-VIEW" target="_blank" rel="noreferrer"&gt;pg_stat_activity states, PG 15&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;Current overall state of this backend. Possible values are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;active&lt;/code&gt;: The backend is executing a query.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;idle&lt;/code&gt;: The backend is waiting for a new client command.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;idle in transaction&lt;/code&gt;: The backend is in a transaction, but is not currently executing a query.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;idle in transaction (aborted)&lt;/code&gt;: This state is similar to &lt;code&gt;idle in transaction&lt;/code&gt;, except one of the statements in the transaction caused an error.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fastpath function call&lt;/code&gt;: The backend is executing a fast-path function.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;disabled&lt;/code&gt;: This state is reported if &lt;a href="https://www.postgresql.org/docs/15/runtime-config-statistics.html#GUC-TRACK-ACTIVITIES" target="_blank" rel="noreferrer"&gt;track_activities&lt;/a&gt; is disabled in this backend.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Common states are: &lt;code&gt;active&lt;/code&gt;, &lt;code&gt;idle&lt;/code&gt;, &lt;code&gt;idle in transaction&lt;/code&gt;, &lt;code&gt;idle in transaction (aborted)&lt;/code&gt;. A common misconception about &lt;code&gt;idle in transaction&lt;/code&gt;: it only means no SQL is running &lt;em&gt;right now&lt;/em&gt; and the transaction hasn&amp;rsquo;t committed — it does NOT mean the transaction has been idle for a long time. Don&amp;rsquo;t use &lt;code&gt;xact_start&lt;/code&gt; + &lt;code&gt;idle in transaction&lt;/code&gt; to judge how long a transaction has been idle. Use &lt;code&gt;state_change&lt;/code&gt; + &lt;code&gt;idle in transaction&lt;/code&gt; instead.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Memory
 &lt;div id="memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Memory issues are extremely tricky, and I handled many this year, finding some good solutions. But memory knowledge is broad — I&amp;rsquo;ll try to simplify as much as possible, going straight to symptoms, results, and solutions.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Memory Issues and Huge Pages
 &lt;div id="memory-issues-and-huge-pages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-issues-and-huge-pages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Classification of PostgreSQL memory problems:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/1fdf8b816eb0.png" alt="image.png" /&gt;&lt;/p&gt;
&lt;p&gt;Relevant wchan states for PG memory issues:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c2d5d422e6f9.png" alt="image.png" /&gt;&lt;/p&gt;
&lt;p&gt;Huge pages are very effective against memory fragmentation and direct memory reclaim within cgroups.&lt;/p&gt;
&lt;p&gt;Benchmark results for huge pages: &lt;a href="https://docs.paic.com.cn/#/post/84479375" target="_blank" rel="noreferrer"&gt;https://docs.paic.com.cn/#/post/84479375&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Theoretical benefits of huge pages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reduced TLB pressure&lt;/li&gt;
&lt;li&gt;Reduced page table size in main memory&lt;/li&gt;
&lt;li&gt;Huge pages are physically contiguous. Contiguous physical memory access is better than non-contiguous&lt;/li&gt;
&lt;li&gt;With huge pages, pages are directly mapped without multi-level PTE entries&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, huge pages bring management challenges:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Must pre-allocate huge pages&lt;/li&gt;
&lt;li&gt;Must calculate huge page size in advance to avoid memory waste&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Memory knowledge is extensive. For more, refer to &lt;a href="https://lastdba.com/en/2025/06/19/linux%E5%86%85%E5%AD%98%E8%BF%9B%E9%98%B6/" &gt;Advanced Linux Memory&lt;/a&gt;. Key takeaways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rule out OS-level issues before tackling PG instance-level issues&lt;/li&gt;
&lt;li&gt;Huge pages have remarkable effects, but in rare cases they don&amp;rsquo;t help&lt;/li&gt;
&lt;li&gt;Many people don&amp;rsquo;t monitor pgpgin/pgpgout/pgfree, or even pgscank/pgscand — they only look at CPU and memory usage. That&amp;rsquo;s insufficient for operating PostgreSQL.&lt;/li&gt;
&lt;li&gt;Without good operational practices, PG memory can be very unstable&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Notable Cgroup Knowledge
 &lt;div id="notable-cgroup-knowledge" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#notable-cgroup-knowledge" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Cgroup knowledge is also extensive. Refer to earlier articles; here&amp;rsquo;s a quick summary.&lt;/p&gt;
&lt;p&gt;Cgroup v1 has inherent flaws:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Does not account for cgroup page tables&lt;/li&gt;
&lt;li&gt;Does not account for cgroup slab&lt;/li&gt;
&lt;li&gt;Does not account for cgroup huge pages (huge pages are not charged, not just uncounted)&lt;/li&gt;
&lt;li&gt;Does not account for cgroup async/sync page reclaim&lt;/li&gt;
&lt;li&gt;Cgroup RSS and process RSS have inconsistent accounting methods&lt;/li&gt;
&lt;li&gt;shmem accounting is messy&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Unsolved Mysteries
 &lt;div id="unsolved-mysteries" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#unsolved-mysteries" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Huge pages have solved many problems, but not all. The unsolved portion remains to be researched — hopefully clarified in 2026.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Pay Attention to the OS
 &lt;div id="pay-attention-to-the-os" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pay-attention-to-the-os" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Pay Attention to Everything OS
 &lt;div id="pay-attention-to-everything-os" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pay-attention-to-everything-os" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;blockquote&gt;&lt;p&gt;To operate open-source databases well, you need to understand the operating system.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;(Source forgotten)&lt;/p&gt;
&lt;p&gt;To operate PostgreSQL well, understanding OS principles is essential. PostgreSQL is built on top of the OS (especially Linux) — it uses whatever Linux provides. PostgreSQL is part of the Linux ecosystem. To truly understand how it works, understand the OS first.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Rule out OS-level issues before tackling PG instance-level issues.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;(My own words)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I. CPU&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Since PostgreSQL doesn&amp;rsquo;t use NUMA, whether on bare metal or cgroup/pod-managed CPU, you rarely need to dive into OS-level CPU internals. CPU issues can mostly be diagnosed from SQL or PG stack traces.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;II. Memory&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;See the Memory section. Memory issues require OS-level investigation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;III. Processes&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Inspecting PG process states from the OS is critical. You need to check D state, wchan, RSS, syscalls, at minimum.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;IV. Host Status and Logs&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Monitor host status — CPU, memory, IO, network, logs at the host level. Very important.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s hard to imagine that a vague network IO alert like &amp;ldquo;an I/O error occurred while sending to the backend&amp;rdquo; is related to underlying storage. Beyond &lt;code&gt;/var/log/messages&lt;/code&gt;, PG itself shows nothing. (Of course, this error may have other causes — don&amp;rsquo;t misinterpret.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;V. Others&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Uncategorized.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Physical Reads
 &lt;div id="physical-reads" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#physical-reads" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL itself does &lt;strong&gt;not directly expose a &amp;ldquo;true physical disk read&amp;rdquo; metric&lt;/strong&gt;. The various reads in &lt;code&gt;pg_stat_*&lt;/code&gt; (e.g., &lt;code&gt;pg_stat_database.blks_read&lt;/code&gt;) are reads from the OS cache.&lt;/p&gt;
&lt;p&gt;So how do you monitor physical reads?&lt;/p&gt;
&lt;p&gt;Reads or buffer allocation metrics are supplementary. The best approach is monitoring the OS itself.&lt;/p&gt;
&lt;p&gt;The OS is PostgreSQL&amp;rsquo;s ecosystem. Never look at the database in isolation. Not being able to monitor physical reads at the database level is nothing to be ashamed of — as long as you have a solution.&lt;/p&gt;
&lt;p&gt;Monitor iostat and other disk metrics. For cloud environments, OS-level observability is already mature — don&amp;rsquo;t waste cloud-native observability.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Autovacuum
 &lt;div id="autovacuum" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#autovacuum" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;SQL for monitoring autovacuum processes: &lt;a href="https://gitlab.com/postgres-ai/postgresql-consulting/postgres-howtos/-/blob/main/0067_autovacuum_queue_and_progress.md" target="_blank" rel="noreferrer"&gt;sql autovacuum_queue_and_progress&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Autovacuum Freeze on Large Databases
 &lt;div id="autovacuum-freeze-on-large-databases" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#autovacuum-freeze-on-large-databases" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;With properly configured parameters, monitoring, and alerting, autovacuum freeze requires little attention in most databases.&lt;/p&gt;
&lt;p&gt;However, in databases with extremely high transaction throughput and very large data volumes, you still can&amp;rsquo;t ignore it. Autovacuum prevent wraparound may be running constantly. At minimum, watch these two points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Age alerting: handle promptly and try to prevent the next alert. Don&amp;rsquo;t wait until the last moment to panic (acceleration options depend on version, e.g., &lt;code&gt;INDEX_CLEANUP OFF&lt;/code&gt;, &lt;code&gt;BUFFER_USAGE_LIMIT&lt;/code&gt; adjustments)&lt;/li&gt;
&lt;li&gt;Impact on memory (especially cache). If autovacuum runs nonstop on a very large database, it impacts cache and memory&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For principles and parameters, see this howtos diagram:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c216c393371f.jpg" alt="Wraparound and freeze" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Large Tables That Won&amp;rsquo;t Finish Vacuuming
 &lt;div id="large-tables-that-wont-finish-vacuuming" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#large-tables-that-wont-finish-vacuuming" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&amp;ldquo;Large tables&amp;rdquo; means hundreds of GB, typically with many indexes and dead tuples that prevent vacuum from completing.&lt;/p&gt;
&lt;p&gt;The main bottleneck: (auto)vacuum cleans dead index tuples one by one per dead row. Large table (auto)vacuum is slow here — you&amp;rsquo;ll typically see many dead tuples on the table. Worse, (auto)vacuum may run slower than the rate of dead tuple generation — vacuum never finishes, infinite bloat.&lt;/p&gt;
&lt;p&gt;Experience with large tables that can&amp;rsquo;t finish:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For the same table, dead tuple count is &lt;em&gt;roughly&lt;/em&gt; proportional to execution time&lt;/li&gt;
&lt;li&gt;From autovacuum log&amp;rsquo;s user time and elapsed time, you can observe CPU time and execution time, and roughly estimate delay sleep time&lt;/li&gt;
&lt;li&gt;Disabling autovacuum cost-based delay can reduce execution time by ~3× (index-size dependent; based on a 200GB table with 280GB indexes)&lt;/li&gt;
&lt;li&gt;Adjusting a table&amp;rsquo;s autovacuum cost-based delay means letting autovacuum rest less when processing that table — consuming more CPU and scan IO in a shorter time&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;How to accelerate?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Repack&lt;/strong&gt;. Repack is a nuclear option — fast table rebuild for emergencies. But repack is a CLI tool; running it manually each time is cumbersome.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tune autovacuum cost-based delay parameters&lt;/strong&gt;. Either 1. Increase cost limit: &lt;code&gt;alter table t1 SET (autovacuum_vacuum_cost_limit=1000);&lt;/code&gt;, or 2. Disable delay entirely: &lt;code&gt;alter table t1 SET (autovacuum_vacuum_cost_delay=0);&lt;/code&gt;. Recommended only for tables that can&amp;rsquo;t keep up.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Drop unnecessary indexes&lt;/strong&gt;. Scanning indexes and updating index entries takes the most time — dropping unnecessary indexes is effective.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Partitioned tables&lt;/strong&gt;. Recommended partition size ≤10GB. &lt;em&gt;Converting to partitioned tables is the best solution.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Drop updated_time column indexes&lt;/strong&gt; to leverage HOT, reducing bloat rate.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Checkpoint and Bgwriter
 &lt;div id="checkpoint-and-bgwriter" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#checkpoint-and-bgwriter" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The checkpointer not only creates checkpoints (affecting recovery time) but also flushes dirty buffers. The bgwriter only flushes dirty buffers. Starting from PG 17, some metrics moved to &lt;code&gt;pg_stat_checkpointer&lt;/code&gt;. For PG ≤16, mainly look at &lt;code&gt;pg_stat_bgwriter&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I. Checkpoint intervals&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Metric &lt;code&gt;checkpoints_timed&lt;/code&gt;: corresponds to &lt;code&gt;checkpoint_timeout&lt;/code&gt; parameter&lt;/li&gt;
&lt;li&gt;Metric &lt;code&gt;checkpoints_req&lt;/code&gt;: corresponds to &lt;code&gt;max_wal_size&lt;/code&gt; parameter&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Recommend using &lt;code&gt;checkpoint_timeout&lt;/code&gt; as the primary checkpoint interval. If &lt;code&gt;checkpoints_req&lt;/code&gt; appears, increase &lt;code&gt;max_wal_size&lt;/code&gt; and tune flush parameters accordingly. When FPIs are present, also check these two metrics.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;II. Flush metrics&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Metric &lt;code&gt;buffers_checkpoint&lt;/code&gt;: dirty buffers flushed by checkpointer&lt;/li&gt;
&lt;li&gt;Metric &lt;code&gt;buffers_clean&lt;/code&gt;: dirty buffers flushed by bgwriter&lt;/li&gt;
&lt;li&gt;Metric &lt;code&gt;buffers_backend&lt;/code&gt;: dirty buffers flushed by backends — should be as close to zero as possible; occurrence means bgwriter isn&amp;rsquo;t aggressive enough&lt;/li&gt;
&lt;li&gt;Metric &lt;code&gt;buffers_backend_fsync&lt;/code&gt;: meaning unclear&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The tuning goal is flush priority: &lt;strong&gt;bgwriter flush &amp;gt; checkpointer flush &amp;gt; backend flush&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The checkpointer can flush as a side effect, but checkpointer flush speed is hard to control — it can cause IO spikes. So bgwriter flush priority should be higher than checkpointer. Backend flush is obviously worst — minimize it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;III. Bgwriter flush parameters&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Bgwriter controls flush speed through a &amp;ldquo;write some, pause, write again&amp;rdquo; cycle:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Parameter &lt;code&gt;bgwriter_delay&lt;/code&gt;: how long to pause&lt;/li&gt;
&lt;li&gt;Parameter &lt;code&gt;bgwriter_lru_maxpages&lt;/code&gt;: max pages to write per cycle&lt;/li&gt;
&lt;li&gt;Parameter &lt;code&gt;bgwriter_lru_multiplier&lt;/code&gt;: pages per cycle = (recent buffer allocation × lru_multiplier), capped at lru_maxpages&lt;/li&gt;
&lt;li&gt;Parameter &lt;code&gt;bgwriter_flush_after&lt;/code&gt;: fsync after writing this many buffers&lt;/li&gt;
&lt;li&gt;Metric &lt;code&gt;pg_buffers_alloc&lt;/code&gt;: represents shared memory buffer allocation (allocation means actual eviction occurred, somewhat indicative of pgpgin)&lt;/li&gt;
&lt;li&gt;Metric &lt;code&gt;maxwritten_clean&lt;/code&gt;: number of times &lt;code&gt;bgwriter_lru_maxpages&lt;/code&gt; was reached&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Default bgwriter flush logic: &lt;strong&gt;each cycle: flush (new buffer count × 2, max 100 dirty buffers), delay 200ms, fsync every 64 buffers flushed&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Per-cycle flush volume depends on recent buffer allocation and &lt;code&gt;bgwriter_lru_multiplier&lt;/code&gt;. During peak times, buffer allocation is typically high, so it usually hits &lt;code&gt;bgwriter_lru_maxpages&lt;/code&gt;. Thus: &lt;strong&gt;&lt;code&gt;bgwriter_lru_maxpages&lt;/code&gt; caps peak flush volume; &lt;code&gt;bgwriter_lru_multiplier&lt;/code&gt; prevents excessive flushing during off-peak times&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;IV. Flush parameter reference&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Default max bgwriter flush = 100 × 5 × 8KB = 3.9MB/s. The defaults are definitely too low. If tuning upward, adjust based on &lt;code&gt;shared_buffers&lt;/code&gt; size and workload.&lt;/p&gt;
&lt;p&gt;After all that theory, here&amp;rsquo;s a practical reference:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Read/write ratio 2:8, high load&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_buffers&lt;span style="color:#f92672"&gt;=&lt;/span&gt;40GB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;checkpoint_timeout&lt;span style="color:#f92672"&gt;=&lt;/span&gt;20min;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;max_wal_size&lt;span style="color:#f92672"&gt;=&lt;/span&gt;80GB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bgwriter_delay&lt;span style="color:#f92672"&gt;=&lt;/span&gt;20ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bgwriter_lru_maxpages&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bgwriter_lru_multiplier&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Adjust further as needed.&lt;/p&gt;
&lt;p&gt;As for effects: from practical experience, don&amp;rsquo;t expect standalone bgwriter tuning to yield great results. Overly aggressive bgwriter tuning can even backfire.&lt;/p&gt;
&lt;p&gt;So: &lt;strong&gt;If your database hasn&amp;rsquo;t been clearly diagnosed with checkpoint flush spikes or other flush issues, don&amp;rsquo;t touch this.&lt;/strong&gt; Only recommended for core large databases with high concurrency, as a supplementary tuning strategy alongside other changes (migrations, shared_buffer adjustments, etc.).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;V. Flush parameter summary&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Bgwriter flushing can be summarized as &amp;ldquo;three hard&amp;rsquo;s&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Hard to understand, hard to tune, hard to see results.&amp;rdquo;&lt;/p&gt;

&lt;h2 class="relative group"&gt;DB4AI
 &lt;div id="db4ai" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#db4ai" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;AI Task Scheduling Writes to Database
 &lt;div id="ai-task-scheduling-writes-to-database" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ai-task-scheduling-writes-to-database" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;AI applications are widely deployed at the development level. One scenario: AI task invocations write to the database. Task invocations can spike instantly, and the database writes may lack concurrency control, causing CPU or other resource spikes.&lt;/p&gt;
&lt;p&gt;This is a new database incident pattern in the AI era. Be careful.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Vector HNSW
 &lt;div id="vector-hnsw" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vector-hnsw" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Reference: &lt;a href="https://postgresql.us/events/pgconfnyc2024/sessions/session/1862/slides/172/pgvector_best_practices_pgconfnyc2024.pdf" target="_blank" rel="noreferrer"&gt;https://postgresql.us/events/pgconfnyc2024/sessions/session/1862/slides/172/pgvector_best_practices_pgconfnyc2024.pdf&lt;/a&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;HNSW Index Build Acceleration
 &lt;div id="hnsw-index-build-acceleration" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hnsw-index-build-acceleration" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;HNSW index builds can be extremely slow — millions of rows can take hours.&lt;/p&gt;
&lt;p&gt;Factors affecting HNSW build speed include instance memory (and CPU) as well as index build parameters:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;maintenance_work_mem&lt;span style="color:#f92672"&gt;=&lt;/span&gt;3g
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;max_parallel_maintenance_workers&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;m&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ef_construction&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Building HNSW indexes can be painful. Ways to accelerate:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Building the index before data load is an option. Though the total initial time is slower, developers may accept &amp;ldquo;a bit slower&amp;rdquo; but cannot accept &amp;ldquo;index building for 1 hour.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Optimizing post-load index builds:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;SET maintenance_work_mem = '8GB'&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;SET max_parallel_maintenance_workers = 8&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Post-load index builds need attention to memory — strongly related to instance memory and free memory.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Note: &lt;code&gt;maintenance_work_mem&lt;/code&gt; can protect OS memory. If &lt;code&gt;maintenance_work_mem&lt;/code&gt; exceeds available OS memory and the table is large, the connection is terminated immediately (fast failure):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;53200&lt;/span&gt;: could &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; resize shared memory segment &lt;span style="color:#e6db74"&gt;&amp;#34;/PostgreSQL.1390017142&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6439348672&lt;/span&gt; bytes: Cannot &lt;span style="color:#66d9ef"&gt;allocate&lt;/span&gt; memory
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: dsm_impl_posix, dsm_impl.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;314&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Note: if memory used during build exceeds &lt;code&gt;maintenance_work_mem&lt;/code&gt;, an info notice appears (after some time):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;NOTICE: &lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;: hnsw graph &lt;span style="color:#66d9ef"&gt;no&lt;/span&gt; longer fits &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; maintenance_work_mem &lt;span style="color:#66d9ef"&gt;after&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;886990&lt;/span&gt; tuples
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: Building will take significantly &lt;span style="color:#66d9ef"&gt;more&lt;/span&gt; time.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: Increase maintenance_work_mem &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; speed up builds.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: InsertTuple, hnswbuild.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;525&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 class="relative group"&gt;HNSW Index Query Performance
 &lt;div id="hnsw-index-query-performance" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hnsw-index-query-performance" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Query recall and performance need to be balanced via the &lt;code&gt;ef_search&lt;/code&gt; parameter.&lt;/p&gt;
&lt;p&gt;Besides &lt;code&gt;ef_search&lt;/code&gt;, one more factor significantly impacts query speed: &lt;strong&gt;whether the HNSW index is cached in memory&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Index NOT in memory:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; image_id, applyNo, feature_vector &lt;span style="color:#f92672"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; vectorsit
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; image_features_test2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; distance
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;LIMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11852&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;11865&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;74&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;82193&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;073&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;82193&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;185&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1796&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;read&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9309&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I&lt;span style="color:#f92672"&gt;/&lt;/span&gt;O Timings: shared&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;local&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;read&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;82108&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;559&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;008&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;009&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; test_0 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1360&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;007&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;008&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_feature_hnsw &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; image_features_test2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11852&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;78&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1292546&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;989705&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;82193&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;071&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;82193&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;179&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;By&lt;/span&gt;: (feature_vector &lt;span style="color:#f92672"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1796&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;read&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9309&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I&lt;span style="color:#f92672"&gt;/&lt;/span&gt;O Timings: shared&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;local&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;read&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;82108&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;559&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;130&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;82193&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;279&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Index IN memory:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11852&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;11865&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;74&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;240&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;350&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11105&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;007&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;008&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; test_0 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1360&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;007&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;007&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_feature_hnsw &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; image_features_test2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11852&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;78&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1292546&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;989705&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;239&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;344&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;By&lt;/span&gt;: (feature_vector &lt;span style="color:#f92672"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11105&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;093&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;392&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Same index, same execution plan — &lt;strong&gt;the performance difference between index-in-memory and index-not-in-memory is 82193.279 / 20.392 = 4000×!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This gap cannot be ignored. When monitoring HNSW index performance, always check whether the index is in memory. Reference SQL:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Check if HNSW index is cached in shared buffers via pg_buffercache
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.relname, pg_size_pretty(&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; buffered, round(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#f92672"&gt;/&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; setting &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_settings &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;shared_buffers&amp;#39;&lt;/span&gt;)::integer, &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; buffer_percent, round(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; &lt;span style="color:#f92672"&gt;/&lt;/span&gt; pg_table_size(&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.oid), &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; percent_of_relation &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INNER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; pg_buffercache b &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; b.relfilenode &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.relfilenode &lt;span style="color:#66d9ef"&gt;INNER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; (b.reldatabase &lt;span style="color:#f92672"&gt;=&lt;/span&gt; d.oid &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; d.datname &lt;span style="color:#f92672"&gt;=&lt;/span&gt; current_database()) &lt;span style="color:#66d9ef"&gt;GROUP&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.oid, &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.relname &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DESC&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LIMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; buffered &lt;span style="color:#f92672"&gt;|&lt;/span&gt; buffer_percent &lt;span style="color:#f92672"&gt;|&lt;/span&gt; percent_of_relation 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------+------------+----------------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx_feature_hnsw_1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2117&lt;/span&gt; MB &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;91&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx_feature_hnsw &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;78&lt;/span&gt; MB &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_inherits_parent_index &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; bytes &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Application Releases
 &lt;div id="application-releases" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#application-releases" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;DDL Tips
 &lt;div id="ddl-tips" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ddl-tips" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Online DDL tools like pg-osc and pg_migrate don&amp;rsquo;t support partitioned tables, and they have other issues — real-world use is difficult. So DDL tips are still useful: lowering lock levels, proactively identifying blocking, etc., to reduce DDL blocking and rewrite risks.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5f610ac9b703.png" alt="picddl" /&gt;&lt;/p&gt;
&lt;p&gt;Key points for understanding this diagram:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Before changes:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Ensure no long transactions on the table — long transactions hold locks on tables persistently. Long transactions are a well-known hazard in PG; handle them first.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Ensure no autovacuum (to prevent wraparound) on the table — autovacuum generally doesn&amp;rsquo;t block SQL, except when running &lt;a href="https://www.postgresql.org/docs/18/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND" target="_blank" rel="noreferrer"&gt;to prevent wraparound&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Autovacuum workers generally don&amp;rsquo;t block other commands. If a process attempts to acquire a lock that conflicts with the &lt;code&gt;SHARE UPDATE EXCLUSIVE&lt;/code&gt; lock held by autovacuum, lock acquisition will interrupt the autovacuum. However, if the autovacuum is running to prevent transaction ID wraparound (i.e., the autovacuum query name in the &lt;code&gt;pg_stat_activity&lt;/code&gt; view ends with &lt;code&gt;(to prevent wraparound)&lt;/code&gt;), the autovacuum is not automatically interrupted.&lt;/p&gt;
&lt;/blockquote&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;lock_timeout=2000&lt;/code&gt; — if a lock cannot be acquired within 2 seconds, bail out to avoid mass blocking.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Special cases for &amp;ldquo;small-to-large&amp;rdquo; type changes:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Small-to-large type changes generally don&amp;rsquo;t rewrite the table, but there are exceptions. Pay special attention to &lt;code&gt;int → bigint&lt;/code&gt; (common for PK columns) and &lt;code&gt;char(n) → char(m)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Partitioned table indexes. Small-to-large type changes on partitioned tables don&amp;rsquo;t rewrite the table, but they &lt;strong&gt;do rebuild indexes&lt;/strong&gt; — and rebuilding indexes on partitioned tables is typically very slow, potentially causing prolonged level-8 lock blocking. This behavior is unique to partitioned tables.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Changing column types:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Almost always rewrites the table, except for equivalent types or small-to-large cases.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;DDL lock-level reduction tips:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use CIC (CREATE INDEX CONCURRENTLY) for indexes. If partitions don&amp;rsquo;t support it, do CIC on child tables (remember to attach the index).&lt;/li&gt;
&lt;li&gt;CIC has multiple phases. Phases 2 and 3 acquire a SHARE lock, blocking DML. (Official docs only mention SHARE UPDATE EXCLUSIVE — CIC isn&amp;rsquo;t a simple explicit lock.)&lt;/li&gt;
&lt;li&gt;Add primary keys with &lt;code&gt;USING INDEX&lt;/code&gt;. For partitions, leverage &amp;ldquo;add PK on child table + add PK on parent can merge existing child PKs.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;VALIDATE CONSTRAINT&lt;/code&gt; for constraints.&lt;/li&gt;
&lt;li&gt;PG &amp;lt;17 doesn&amp;rsquo;t support &lt;code&gt;NOT NULL VALIDATE&lt;/code&gt;. Use &lt;code&gt;CHECK(col1 IS NOT NULL)&lt;/code&gt; instead. This CHECK-to-NOT-NULL conversion won&amp;rsquo;t produce extra scans.&lt;/li&gt;
&lt;li&gt;Adding a column with a volatile DEFAULT rewrites the table. Use the non-volatile-no-rewrite property: add the column first (no rewrite), then UPDATE legacy data as needed.&lt;/li&gt;
&lt;li&gt;When attaching partitions, use CHECK constraints to reduce downtime, and use &lt;code&gt;VALIDATE CONSTRAINT&lt;/code&gt; for the CHECK.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CREATE TABLE LIKE&lt;/code&gt; + &lt;code&gt;ATTACH&lt;/code&gt; has much lower lock levels than &lt;code&gt;PARTITION OF&lt;/code&gt; (though I still prefer &lt;code&gt;PARTITION OF&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;After changes:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Remember to collect statistics (needed in many scenarios).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Parallel Index Creation
 &lt;div id="parallel-index-creation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#parallel-index-creation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;In production, you may need to create indexes on very large tables that take a long time. Parallel index creation can shorten build time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Parallel index creation on regular tables:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Parallel parameter: &lt;code&gt;max_parallel_maintenance_workers&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Prerequisites:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enough workers: check &lt;code&gt;max_parallel_workers&lt;/code&gt;, &lt;code&gt;max_worker_processes&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Increase &lt;code&gt;maintenance_work_mem&lt;/code&gt; to GB scale&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Effective for B-tree and BRIN&lt;/li&gt;
&lt;li&gt;&lt;code&gt;maintenance_work_mem&lt;/code&gt; limits the entire utility command. Unlike parallel query, where resource limits are per worker process.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From test results, parallel index creation shows diminishing returns beyond 8 workers (this conclusion may not hold in all environments).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Parallel index creation on partitioned tables:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Recommend manual parallel creation across child partitions — run index creation on multiple partitions simultaneously rather than using native parallelism. This reduces multi-process coordination overhead.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Cached Plan Must Not Change Resource
 &lt;div id="cached-plan-must-not-change-resource" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cached-plan-must-not-change-resource" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;After adding a new column the previous night, application connections started throwing errors the next morning: &amp;ldquo;cached plan must not change result type in PostgreSQL&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Reproduction:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; a(b varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; p1 (varchar) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLUMN&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;TYPE&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; p1 (&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;A000: cached plan must &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; change &lt;span style="color:#66d9ef"&gt;result&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: RevalidateCachedQuery, plancache.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;718&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Test environment solutions:&lt;/strong&gt;
&lt;code&gt;DEALLOCATE ALL&lt;/code&gt; — actively discard prepared statements
Or,
&lt;code&gt;DISCARD ALL&lt;/code&gt; — actively discard all session state&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DEALLOCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALL&lt;/span&gt;; &lt;span style="color:#75715e"&gt;--DISCARD ALL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; p1 (varchar) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; p1 (&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Production environment solutions:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Since the error occurs at the application layer, JDBC can handle &lt;code&gt;DEALLOCATE ALL&lt;/code&gt; / &lt;code&gt;DISCARD ALL&lt;/code&gt;, but the application may not have implemented this. Immediate production solutions:&lt;/p&gt;
&lt;p&gt;Solutions (choose one):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Since connection pools like HikariCP have connection cycling and timeout mechanisms, killing idle sessions will gradually reduce errors.&lt;/li&gt;
&lt;li&gt;Similarly, due to connection pool cycling, you can do nothing — as the pool gradually establishes new connections, the errors fade.&lt;/li&gt;
&lt;li&gt;If business pressure is high enough, consider killing all application connections.&lt;/li&gt;
&lt;li&gt;Rolling restart of the application.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Not&lt;/strong&gt; recommended:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;Restart the application after every DDL.&amp;rdquo; It works but don&amp;rsquo;t recommend this as a standard practice.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;autosave=conservative&lt;/code&gt;. It works but enables subtransactions. A savepoint is set for each query; rollback happens only for rare cases like &amp;lsquo;cached statement cannot change return type&amp;rsquo; or &amp;lsquo;statement XXX is not valid,&amp;rsquo; where the JDBC driver rolls back and retries.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;JDBC configuration suggestions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Configure automatic retry after transaction rollback: &lt;a href="https://developer.aliyun.com/article/741750" target="_blank" rel="noreferrer"&gt;https://developer.aliyun.com/article/741750&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Other JDBC config references: &lt;a href="https://jdbc.postgresql.org/documentation/server-prepare/#corner-cases" target="_blank" rel="noreferrer"&gt;https://jdbc.postgresql.org/documentation/server-prepare/#corner-cases&lt;/a&gt;. Note: some suggestions are not suitable for production.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Physical Replication
 &lt;div id="physical-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#physical-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Query Conflicts
 &lt;div id="query-conflicts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#query-conflicts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Query conflicts are a notoriously frustrating feature that directly impacts the usability of PG standby queries. Query conflicts increase standby lag, yet long-running queries on the standby are logically reasonable. This forces PG administrators to balance between lag management and long-query management — a problem that doesn&amp;rsquo;t exist in other relational databases.&lt;/p&gt;
&lt;p&gt;Hidden characteristics of query conflicts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Even static tables can trigger query conflicts (&lt;a href="https://www.modb.pro/db/1966415366276526080" target="_blank" rel="noreferrer"&gt;see: From Static Table Query Conflicts to Their Principles&lt;/a&gt;). The conflict is a snapshot conflict, largely unrelated to table-level locks — snapshot conflicts are cross-table.&lt;/li&gt;
&lt;li&gt;Long queries affect short queries. Once a long query pushes standby lag to &lt;code&gt;max_standby_streaming_delay&lt;/code&gt;, even short queries get canceled.&lt;/li&gt;
&lt;li&gt;Continuous short queries also cause query conflicts. For example, one short query hasn&amp;rsquo;t finished when the next starts — the two queries may be logically similar, and the startup process hasn&amp;rsquo;t had time to apply WAL. Both short queries hold the XID that needs to be applied. Check whether &lt;code&gt;pg_stat_activity.backend_xmin&lt;/code&gt; is less than the XID the startup process is applying.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Recommended standby query practices:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Using RTO SLO to tune &lt;code&gt;max_standby_streaming_delay&lt;/code&gt; is a good approach. When arguments lead nowhere, SLO-based IT management saves the day.&lt;/li&gt;
&lt;li&gt;Separate short/fast business queries from long queries (data extraction, reporting) onto different standbys to reduce mutual interference.&lt;/li&gt;
&lt;li&gt;Standby queries still need SQL optimization.&lt;/li&gt;
&lt;li&gt;Standby WAL apply lag must be monitored.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Logical Replication
 &lt;div id="logical-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Logical replication has countless pitfalls. 2024 had many nasty cases; 2025 had some too, but less severe, mostly on older PG versions. Overall, logical replication on newer PG versions is trending toward stability.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Slow DDL/DCL Parsing on Older PG Versions
 &lt;div id="slow-ddldcl-parsing-on-older-pg-versions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#slow-ddldcl-parsing-on-older-pg-versions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/1922232196358746112" target="_blank" rel="noreferrer"&gt;Case Study: GRANT and Walsender Stuck&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;On PG 13 and earlier, certain DDL/DCL statements parse slowly and may affect walsender lag. These include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Batch GRANT (including grant all tables) + pathman extension installed (whether used or not)&lt;/li&gt;
&lt;li&gt;Batch DDL/TRUNCATE/DCL/DROP PUBLICATION&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Older PG + Multiple Replication Links + Flink
 &lt;div id="older-pg--multiple-replication-links--flink" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#older-pg--multiple-replication-links--flink" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Flink requires one link per table. Since PostgreSQL walsenders re-decode independently, dozens of Flink links on one PG database are common — and hard to refactor.&lt;/p&gt;
&lt;p&gt;On PG 11 and earlier, the walsender main loop calls &lt;code&gt;PostmasterIsAlive()&lt;/code&gt;, causing poor loop performance. Starting from PG 12, &lt;code&gt;WalSndLoop&lt;/code&gt; no longer polls &lt;code&gt;PostmasterIsAlive()&lt;/code&gt; in the main loop; instead, status checks are placed inside &lt;code&gt;WalSndWait&lt;/code&gt;, using event-based passive notification. This greatly reduces CPU contention.&lt;/p&gt;
&lt;p&gt;If you have multiple Flink links on an older PG version, upgrading can alleviate certain walsender resource contention issues, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;May resolve the problem where walsender startup resource contention prevents the database from coming up for a long time&lt;/li&gt;
&lt;li&gt;May resolve upstream heavy data changes (including DDL rewrites) causing runtime walsender log decoding CPU saturation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Older PG Cannot Auto-Sync New Partitions
 &lt;div id="older-pg-cannot-auto-sync-new-partitions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#older-pg-cannot-auto-sync-new-partitions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;On older PG versions with declarative partitioning, note that you can &lt;strong&gt;only&lt;/strong&gt; publish child tables individually. &lt;a href="https://www.postgresql.org/docs/release/13.0/" target="_blank" rel="noreferrer"&gt;PG ≥13 supports publishing by parent table&lt;/a&gt;. Below that, you must configure sync per partition child table name:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Allow partitioned tables to be logically replicated via &lt;a href="https://www.postgresql.org/docs/13/sql-createpublication.html" target="_blank" rel="noreferrer"&gt;publications&lt;/a&gt; (Amit Langote) &lt;a href="https://postgr.es/c/17b9e7f9f" target="_blank" rel="noreferrer"&gt;§&lt;/a&gt; &lt;a href="https://postgr.es/c/83fd4532a" target="_blank" rel="noreferrer"&gt;§&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Previously, partitions had to be replicated individually. Now a partitioned table can be published explicitly, causing all its partitions to be published automatically. Addition/removal of a partition causes it to be likewise added to or removed from the publication. The &lt;a href="https://www.postgresql.org/docs/13/sql-createpublication.html" target="_blank" rel="noreferrer"&gt;&lt;code&gt;CREATE PUBLICATION&lt;/code&gt;&lt;/a&gt; option &lt;code&gt;publish_via_partition_root&lt;/code&gt; controls whether changes to partitions are published as their own changes or their parent&amp;rsquo;s.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;In other words, if this partitioned table is an upstream for sync, every time a new partition is added, you must adapt the sync tool to publish it — otherwise, new partition data won&amp;rsquo;t sync.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Migration and Upgrades
 &lt;div id="migration-and-upgrades" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#migration-and-upgrades" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Xinchuang Migration and glibc Upgrades
 &lt;div id="xinchuang-migration-and-glibc-upgrades" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#xinchuang-migration-and-glibc-upgrades" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Whether it&amp;rsquo;s Xinchuang (domestic tech migration) or Linux OS version upgrades, glibc upgrades may be involved — and glibc upgrades can be extremely painful. PG sorting was entirely OS-dependent before PG 17.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL cannot detect compatibility issues from glibc upgrades.&lt;/strong&gt; Every minor version of GNU C library makes locale changes. The most problematic version in practice is &lt;strong&gt;glibc 2.28&lt;/strong&gt;, because 2.28 upgraded to a major &lt;strong&gt;Unicode 9.0.0&lt;/strong&gt; release (&lt;a href="https://sourceware.org/glibc/wiki/Release/2.28" target="_blank" rel="noreferrer"&gt;has been updated to a new upstream version from ISO which is in sync with Unicode 9.0.0&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Collations come in many types, and many environments use linguistic sorting (e.g., &lt;code&gt;en_US.utf8&lt;/code&gt;), which is the most version-sensitive. Collation changes most commonly cause database crashes during index scans, but also uncommon issues like duplicate primary keys, data landing in wrong partitions, inconsistent merge join results, etc.&lt;/p&gt;
&lt;p&gt;Fortunately, PG 17 provides a very safe locale provider: &lt;code&gt;builtin&lt;/code&gt;, no longer dependent on OS-provided glibc, ICU, etc. Example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;initdb --locale-provider&lt;span style="color:#f92672"&gt;=&lt;/span&gt;builtin --bultin-locale&lt;span style="color:#f92672"&gt;=&lt;/span&gt;C.UTF-8 dbname1&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;However,&lt;/p&gt;
&lt;p&gt;&lt;code&gt;builtin&lt;/code&gt; is great but arrived too late. Converting existing production instances to &lt;code&gt;builtin&lt;/code&gt; collation is no small task. Moreover, Xinchuang migrations or OS upgrades may not mandate database upgrades.&lt;/p&gt;
&lt;p&gt;During Xinchuang migration, the target host&amp;rsquo;s glibc version is typically higher than the old Intel server&amp;rsquo;s — likely crossing version 2.28. Combined with tight deadlines, KPI pressure, staffing shortages, and large databases, physical migration is unavoidable. So physical Xinchuang migration must account for glibc version and collation-induced anomalies.&lt;/p&gt;
&lt;p&gt;What can you do after physical migration?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I. Official required steps&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Check indexes, rebuild those clearly problematic&lt;/li&gt;
&lt;li&gt;&lt;code&gt;REFRESH DATABASE COLLATION VERSION&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Check dependent objects&lt;/li&gt;
&lt;li&gt;&lt;code&gt;REFRESH COLLATION VERSION&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;II. Unofficial &amp;ldquo;dark arts&amp;rdquo; approaches&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t have a complete solution, just ideas:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Handle partitioned table data landing in wrong partitions&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Partition key is int/bigint/float: unrelated to collation, don&amp;rsquo;t worry&lt;/li&gt;
&lt;li&gt;Partition key is timestamp: don&amp;rsquo;t worry; if varchar or other character types: evaluate&lt;/li&gt;
&lt;li&gt;Partition key is character type: refer to &amp;ldquo;a&amp;rdquo; vs &amp;ldquo;-&amp;rdquo; sort order (pgconf Collation Challenges Sorting It Out). But note:
&lt;ul&gt;
&lt;li&gt;If querying data, don&amp;rsquo;t query from the parent table — may crash or return nothing&lt;/li&gt;
&lt;li&gt;No simple detection method&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Handle primary key / unique key conflicts&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Handle FDW sort range anomalies&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Unknown issues&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Reference: &lt;a href="https://docs.paic.com.cn/#/post/122695260" target="_blank" rel="noreferrer"&gt;collation&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Smooth Major Version Upgrades
 &lt;div id="smooth-major-version-upgrades" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#smooth-major-version-upgrades" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://gitlab.com/postgres-ai/postgresql-consulting/postgres-howtos/-/blob/main/0077_zero_downtime_major_upgrade.md?ref_type=heads" target="_blank" rel="noreferrer"&gt;https://gitlab.com/postgres-ai/postgresql-consulting/postgres-howtos/-/blob/main/0077_zero_downtime_major_upgrade.md?ref_type=heads&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.eu/events/pgconfeu2023/sessions/session/4791/slides/439/2023.pgconf.eu%20Zero%20Downtime%20PostgreSQL%20Upgrades.pdf" target="_blank" rel="noreferrer"&gt;https://www.postgresql.eu/events/pgconfeu2023/sessions/session/4791/slides/439/2023.pgconf.eu%20Zero%20Downtime%20PostgreSQL%20Upgrades.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Common major version upgrade approaches:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pg_upgrade&lt;/code&gt; in-place upgrade. Not recommended — may blow up in place.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pg_dump&lt;/code&gt;: suitable for small databases, longer maintenance windows.&lt;/li&gt;
&lt;li&gt;Logical sync + switchover (pub/sub, pg_logical, DTS, etc.): suitable for small databases, shorter windows.&lt;/li&gt;
&lt;li&gt;Physical forward sync + logical reverse sync: suitable for large databases, not-too-short windows.&lt;/li&gt;
&lt;li&gt;Physical replication full sync + logical incremental sync + switchover: suitable for large databases, extremely short windows.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Syncing full data via logical replication can be extremely slow. In-place upgrade of a new standby carries uncertainty and upgrade time, plus the need for reverse logical sync. &amp;ldquo;Smooth major version upgrade&amp;rdquo; is essentially &amp;ldquo;physical replication full sync + logical incremental sync + switchover.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Key technique: the primary creates a slot and returns an LSN. The new standby uses &lt;code&gt;recovery_target_lsn&lt;/code&gt; to recover to that LSN, then logical sync begins.&lt;/p&gt;
&lt;p&gt;Approximate workflow:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Pre-checks. Multi-database (consider applying one slot LSN for all), extensions, pathman, triggers, foreign keys, unlogged tables, crontab, etc.&lt;/li&gt;
&lt;li&gt;Physical sync. Old and new version software, compare and backup conf files, &lt;code&gt;pg_basebackup&lt;/code&gt; to build new standby on old version.&lt;/li&gt;
&lt;li&gt;Logical sync prep 1. Primary keys and replica identity, create publication; prohibit application DDL/DCL.&lt;/li&gt;
&lt;li&gt;Restore new standby to target LSN. Stop new standby; create slot on old primary and record LSN; start new standby with target LSN.&lt;/li&gt;
&lt;li&gt;New standby major version upgrade. Upgrade, handle issues, switch environment variables.&lt;/li&gt;
&lt;li&gt;Logical sync prep 2. Disable triggers, foreign keys, jobs, extensions, etc.&lt;/li&gt;
&lt;li&gt;Logical sync. Create subscription with specified slot, &lt;code&gt;copy_data=false&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Post logical sync. Check for index corruption, check logs for errors and fix, rebuild remote standbys.&lt;/li&gt;
&lt;li&gt;Switchover. Stop application; advance sequences, enable foreign keys, triggers, jobs, etc.&lt;/li&gt;
&lt;li&gt;Switchover. Build reverse link (old primary subscribes).&lt;/li&gt;
&lt;li&gt;Switchover. Application cutover.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The smooth major upgrade approach is smooth for the business but complex for the DBA. It combines all the drawbacks of logical and physical migration — quite painful to execute. The steps above are already simplified. This approach consumes DBA manpower; consider it only for the most critical databases.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Partitioned Table Management
 &lt;div id="partitioned-table-management" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partitioned-table-management" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PostgreSQL partitioned tables are very flexible, lack built-in interval partitioning, and have varied behavior across versions — making partition management problems an annual occurrence. I believe many PG DBAs still worry about new partition issues.&lt;/p&gt;
&lt;p&gt;My observations on partition management and usage issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Not using declarative partitioning.&lt;/strong&gt; Older versions still use pathman partitioning or inheritance-based partitioning, or continue using them even after upgrading. Declarative partitioning was introduced in PG 10. Due to early version limitations, recommend &lt;strong&gt;only&lt;/strong&gt; using declarative partitioning from at least PG 12 onward to reduce environmental complexity.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Developers building child table indexes/primary keys directly.&lt;/strong&gt; Creating indexes/PKs directly on child tables via SQL rather than through parent table inheritance means the next developer writing SQL may forget. This leads not only to parent-child inconsistency but also child-child inconsistency, eventually making the partition structure unrecognizable.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No new partition management strategy.&lt;/strong&gt; Forgetting to create new partitions or using a DEFAULT partition. Typically, developers create partitions for a few years ahead; next time, the developers may have moved on, and no one manages new partition creation. This is a ticking time bomb, or data lands in the DEFAULT partition, defeating the purpose of partitioning.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Lack of DBA management.&lt;/strong&gt; Yes, DBA! PG partitioned table knowledge is extensive (see &lt;a href="https://blog.csdn.net/qq_40687433/article/details/132525655?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;PostgreSQL Partitioned Tables&lt;/a&gt;). How to build management strategies and implement them in your environment requires proactive DBA involvement. This may be the most important factor.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;My partition management goals (from &lt;a href="https://www.modb.pro/db/2007743085057499136" target="_blank" rel="noreferrer"&gt;Case Study: 2026-01-01 Partition Data Update Failure&lt;/a&gt;):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use the parent table structure as the canonical structure — the parent table faces developers; it should have primary keys, indexes, and replica identity (unless the PG version doesn&amp;rsquo;t support it).&lt;/li&gt;
&lt;li&gt;Keep parent and child tables consistent. Use &lt;code&gt;PARTITION OF&lt;/code&gt; when creating new partitions (yes, I don&amp;rsquo;t recommend ATTACH).&lt;/li&gt;
&lt;li&gt;Keep child tables consistent with each other.&lt;/li&gt;
&lt;li&gt;Create new partitions in advance. Partition data volume should not be too large.&lt;/li&gt;
&lt;li&gt;DEFAULT partitions are not recommended. If created, must monitor writes to them.&lt;/li&gt;
&lt;li&gt;Queries on frequently accessed tables must include the partition key for partition pruning. Otherwise, convert to a regular table.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Observability
 &lt;div id="observability" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#observability" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The &lt;a href="https://www.postgresql.org/docs/18/monitoring-stats.html" target="_blank" rel="noreferrer"&gt;official documentation&lt;/a&gt; clearly explains database, table, index, SQL, flush, and other metrics.&lt;/p&gt;
&lt;p&gt;A few metrics deserve special attention — not only are they unclearly explained, but they&amp;rsquo;re frequently used and have a learning curve.&lt;/p&gt;

&lt;h3 class="relative group"&gt;buffers_alloc, blks_read
 &lt;div id="buffers_alloc-blks_read" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#buffers_alloc-blks_read" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pg_stat_bgwriter.buffers_alloc&lt;/code&gt;: Number of buffers allocated — shared memory eviction volume.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pg_stat_database.blks_read&lt;/code&gt;: OS cache reads.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;(&lt;code&gt;buffers_alloc&lt;/code&gt; may appear in different views across PG versions, but the meaning is the same.)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pg_stat_bgwriter.buffers_alloc&lt;/code&gt; is the shared memory buffer allocation count (called buffer allocation in the source). It represents shared memory eviction volume — newly started databases typically have higher values. When observing shared memory busyness, buffer allocation may be better than hit ratio — high hit ratios can be inflated by frequent small-table access, while allocation represents actual eviction.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;buffers_alloc&lt;/code&gt; counts buffers allocated after reading from cache and loading into a new shared buffer — somewhat representative of OS cache reads too? But in practice, &lt;code&gt;buffers_alloc&lt;/code&gt; and &lt;code&gt;blks_read&lt;/code&gt; have similar meanings yet can differ significantly in value. Why? Unclear, pending research.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;numBufferAllocs&lt;/code&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;tup_fetched, tup_returned
 &lt;div id="tup_fetched-tup_returned" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#tup_fetched-tup_returned" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;These are metrics in &lt;code&gt;pg_stat_database&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tup_fetched&lt;/code&gt;: Number of rows ultimately returned from index scans, after removing filtered rows, dead tuples, and invisible rows. Result-oriented.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tup_returned&lt;/code&gt;: Number of rows fetched from the table during index scans, regardless of filter conditions, dead tuples, or visibility. Process-oriented.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Thus, &lt;code&gt;tup_returned&lt;/code&gt; is typically much higher than &lt;code&gt;tup_fetched&lt;/code&gt;. An abnormally high &lt;code&gt;tup_returned&lt;/code&gt; suggests optimization opportunity — after all, many rows were accessed but few returned to the client.&lt;/p&gt;

&lt;h3 class="relative group"&gt;idx_tup_fetch, idx_tup_read
 &lt;div id="idx_tup_fetch-idx_tup_read" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#idx_tup_fetch-idx_tup_read" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;These are metrics in &lt;code&gt;pg_stat_all_indexes&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;idx_tup_read&lt;/code&gt;: Number of index entries accessed (counted from the index side), includes bitmap scans.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;idx_tup_fetch&lt;/code&gt;: Number of rows ultimately returned from index scans (counted from the table side), excludes bitmap scans.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Madness.&lt;/p&gt;
&lt;p&gt;One thing to remember: &lt;strong&gt;&lt;code&gt;xx_tup_fetch&lt;/code&gt;&lt;/strong&gt; refers to the final rows returned after index access + table fetch — result-oriented.&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://gitlab.com/postgres-ai/postgresql-consulting/postgres-howtos" target="_blank" rel="noreferrer"&gt;postgres-ai howtos&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://postgresql.us/events/pgconfnyc2024/sessions/session/1862/slides/172/pgvector_best_practices_pgconfnyc2024.pdf" target="_blank" rel="noreferrer"&gt;Best practices for using pgvector&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/2007743085057499136" target="_blank" rel="noreferrer"&gt;Case Study: 2026-01-01 Partition Data Update Failure&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/132525655?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;PostgreSQL Partitioned Tables&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/1976119963471589376" target="_blank" rel="noreferrer"&gt;Case Study: From Inaccurate DISTINCT to DISTINCT Calculation Principles&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/1964312913808732160" target="_blank" rel="noreferrer"&gt;Case Study: Adding an Index Causes Performance Degradation and Generic Plans&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/1966415366276526080" target="_blank" rel="noreferrer"&gt;From Static Table Query Conflicts to Their Principles&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/1948643346948304896" target="_blank" rel="noreferrer"&gt;Control File Parameters and Primary-Standby Parameter Mismatch&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://liuzhilong.blog.csdn.net/article/details/130783036" target="_blank" rel="noreferrer"&gt;https://liuzhilong.blog.csdn.net/article/details/130783036&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://techcommunity.microsoft.com/blog/adforpostgresql/improving-postgres-connection-scalability-snapshots/1806462" target="_blank" rel="noreferrer"&gt;https://techcommunity.microsoft.com/blog/adforpostgresql/improving-postgres-connection-scalability-snapshots/1806462&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/17/sql-prepare.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/17/sql-prepare.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/17/sql-deallocate.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/17/sql-deallocate.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/release/13.0/" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/release/13.0/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://jdbc.postgresql.org/documentation/use/" target="_blank" rel="noreferrer"&gt;https://jdbc.postgresql.org/documentation/use/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://jdbc.postgresql.org/documentation/server-prepare/#server-prepared-statements" target="_blank" rel="noreferrer"&gt;https://jdbc.postgresql.org/documentation/server-prepare/#server-prepared-statements&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.eu/events/pgconfeu2023/sessions/session/4791/slides/439/2023.pgconf.eu%20Zero%20Downtime%20PostgreSQL%20Upgrades.pdf" target="_blank" rel="noreferrer"&gt;https://www.postgresql.eu/events/pgconfeu2023/sessions/session/4791/slides/439/2023.pgconf.eu%20Zero%20Downtime%20PostgreSQL%20Upgrades.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks to Master Gao for the 2025 battles.&lt;/p&gt;</content:encoded></item><item><title>Case: Partition Data UPDATE Failure on 2026-01-01</title><link>https://lastdba.com/en/2026/01/04/case-partition-data-update-failure-on-2026-01-01/</link><pubDate>Sun, 04 Jan 2026 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2026/01/04/case-partition-data-update-failure-on-2026-01-01/</guid><description>&lt;h2 class="relative group"&gt;Symptoms
 &lt;div id="symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;On December 30, business errors were reported — data could not be updated:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;55000&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt; cannot update table &lt;span style="color:#e6db74"&gt;&amp;#34;tablzl_202601&amp;#34;&lt;/span&gt; because it does not have a replica identity and publishes updates
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LOCATION: CheckCmdReplicaIdentity, execReplication.c:&lt;span style="color:#ae81ff"&gt;575&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Temporary Recovery
 &lt;div id="temporary-recovery" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#temporary-recovery" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The error message was clear: no replica identity. The table was a partitioned table and a 2026 partition, so I immediately suspected the new partition lacked a primary key. (A new table&amp;rsquo;s replica identity defaults to &lt;code&gt;default&lt;/code&gt;, which only uses a primary key as the replica identity. Without a primary key, updates are impossible.)&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Symptoms
 &lt;div id="symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;On December 30, business errors were reported — data could not be updated:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;55000&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt; cannot update table &lt;span style="color:#e6db74"&gt;&amp;#34;tablzl_202601&amp;#34;&lt;/span&gt; because it does not have a replica identity and publishes updates
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LOCATION: CheckCmdReplicaIdentity, execReplication.c:&lt;span style="color:#ae81ff"&gt;575&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Temporary Recovery
 &lt;div id="temporary-recovery" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#temporary-recovery" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The error message was clear: no replica identity. The table was a partitioned table and a 2026 partition, so I immediately suspected the new partition lacked a primary key. (A new table&amp;rsquo;s replica identity defaults to &lt;code&gt;default&lt;/code&gt;, which only uses a primary key as the replica identity. Without a primary key, updates are impossible.)&lt;/p&gt;
&lt;p&gt;Further investigation revealed: the parent table had no primary key or indexes, child partitions from 2025 and earlier had both primary keys and indexes, but 2026 and later child partitions had neither — and all child partitions were published. Roughly:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;p_parent &lt;span style="color:#75715e"&gt;-- no PK, no indexes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;p_child_202511 &lt;span style="color:#75715e"&gt;-- has PK, has indexes, published
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;p_child_202512 &lt;span style="color:#75715e"&gt;-- has PK, has indexes, published
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;p_child_202601 &lt;span style="color:#75715e"&gt;-- no PK, no indexes, published
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;p_child_202602 &lt;span style="color:#75715e"&gt;-- no PK, no indexes, published&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since the parent table had nothing, a &lt;code&gt;partition of&lt;/code&gt; child would also have nothing — you must manually create the primary key and indexes for each child partition. So the new partition creation was problematic; the old partitions presumably had them added after creation.&lt;/p&gt;
&lt;p&gt;Additionally, publishing partitioned tables via the parent was &lt;a href="https://www.postgresql.org/docs/release/13.0/" target="_blank" rel="noreferrer"&gt;only supported starting from PG13&lt;/a&gt;. Previously, you couldn&amp;rsquo;t publish via the parent — only via child tables. This database was on PG11.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Allow partitioned tables to be logically replicated via &lt;a href="https://www.postgresql.org/docs/13/sql-createpublication.html" target="_blank" rel="noreferrer"&gt;publications&lt;/a&gt; (Amit Langote) &lt;a href="https://postgr.es/c/17b9e7f9f" target="_blank" rel="noreferrer"&gt;§&lt;/a&gt; &lt;a href="https://postgr.es/c/83fd4532a" target="_blank" rel="noreferrer"&gt;§&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Previously, partitions had to be replicated individually. Now a partitioned table can be published explicitly, causing all its partitions to be published automatically. Addition/removal of a partition causes it to be likewise added to or removed from the publication. The &lt;a href="https://www.postgresql.org/docs/13/sql-createpublication.html" target="_blank" rel="noreferrer"&gt;&lt;code&gt;CREATE PUBLICATION&lt;/code&gt;&lt;/a&gt; option &lt;code&gt;publish_via_partition_root&lt;/code&gt; controls whether changes to partitions are published as their own changes or their parent&amp;rsquo;s.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;After the initial diagnosis and given the urgency, there were three ways to temporarily resolve:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Add primary keys to the 2026 partitions&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;replica identity full&lt;/code&gt; on the 2026 partitions&lt;/li&gt;
&lt;li&gt;Remove the 2026 partitions from the publication&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since recovery time was about the same for all options, we chose adding primary keys — the lowest operational cost — to at least stop the business errors.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Root Cause Analysis
 &lt;div id="root-cause-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#root-cause-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The issue seems clear: &amp;ldquo;no replica identity + published + no primary key&amp;rdquo; prevents updates. But several questions still needed answers.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Question 1: Why does the UPDATE fail even though there&amp;rsquo;s no 202601 data at all (the new partition has zero rows)?
 &lt;div id="question-1-why-does-the-update-fail-even-though-theres-no-202601-data-at-all-the-new-partition-has-zero-rows" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#question-1-why-does-the-update-fail-even-though-theres-no-202601-data-at-all-the-new-partition-has-zero-rows" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The SQL text was:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; tablzl_202601
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt; idid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_updated &lt;span style="color:#f92672"&gt;=&lt;/span&gt; now()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; mykey &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The partition key for &lt;code&gt;tablzl_202601&lt;/code&gt; is &lt;code&gt;created_date&lt;/code&gt;. The SQL WHERE clause didn&amp;rsquo;t include the partition key, so when attempting to update the 202601 partition, it found no primary key and errored out.&lt;/p&gt;
&lt;p&gt;As for whether row existence or replica identity is checked first, we can see from &lt;code&gt;ExecSimpleRelationUpdate&lt;/code&gt;. This function has changed very little across PG versions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Find the searchslot tuple and update it with data in the slot,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * update the indexes, and execute any constraints and per-row triggers.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Caller is responsible for opening the indexes.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ExecSimpleRelationUpdate&lt;/span&gt;(EState &lt;span style="color:#f92672"&gt;*&lt;/span&gt;estate, EPQState &lt;span style="color:#f92672"&gt;*&lt;/span&gt;epqstate,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 TupleTableSlot &lt;span style="color:#f92672"&gt;*&lt;/span&gt;searchslot, TupleTableSlot &lt;span style="color:#f92672"&gt;*&lt;/span&gt;slot)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;CheckCmdReplicaIdentity&lt;/span&gt;(rel, CMD_UPDATE); &lt;span style="color:#75715e"&gt;// check replica identity
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* BEFORE ROW UPDATE Triggers */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (resultRelInfo&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;ri_TrigDesc &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		resultRelInfo&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;ri_TrigDesc&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;trig_update_before_row)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		slot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ExecBRUpdateTriggers&lt;/span&gt;(estate, epqstate, resultRelInfo,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;searchslot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;tts_tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_self,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									NULL, slot);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (slot &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL)		&lt;span style="color:#75715e"&gt;/* &amp;#34;do nothing&amp;#34; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			skip_tuple &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;skip_tuple)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		List	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;recheckIndexes &lt;span style="color:#f92672"&gt;=&lt;/span&gt; NIL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Check the constraints of the tuple */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_att&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;constr)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ExecConstraints&lt;/span&gt;(resultRelInfo, slot, estate);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (resultRelInfo&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;ri_PartitionCheck)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ExecPartitionCheck&lt;/span&gt;(resultRelInfo, slot, estate, true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Materialize slot into a tuple that we can scribble upon. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		tuple &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ExecMaterializeSlot&lt;/span&gt;(slot);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* OK, update the tuple and index entries for it */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;simple_heap_update&lt;/span&gt;(rel, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;searchslot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;tts_tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_self,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;tts_tuple);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (resultRelInfo&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;ri_NumIndices &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;HeapTupleIsHeapOnly&lt;/span&gt;(slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;tts_tuple))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			recheckIndexes &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ExecInsertIndexTuples&lt;/span&gt;(slot, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;(tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_self),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;												 estate, false, NULL,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;												 NIL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* AFTER ROW UPDATE Triggers */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ExecARUpdateTriggers&lt;/span&gt;(estate, resultRelInfo,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;searchslot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;tts_tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_self,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 NULL, tuple, recheckIndexes, NULL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;list_free&lt;/span&gt;(recheckIndexes);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;ExecSimpleRelationUpdate&lt;/code&gt; flow:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Check replica identity&lt;/li&gt;
&lt;li&gt;BEFORE ROW UPDATE triggers&lt;/li&gt;
&lt;li&gt;Check constraints (both non-partition and partition constraints)&lt;/li&gt;
&lt;li&gt;Update the row&lt;/li&gt;
&lt;li&gt;Insert index entries&lt;/li&gt;
&lt;li&gt;AFTER ROW UPDATE triggers&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So PG&amp;rsquo;s logic checks replica identity first, before row updates and everything else.&lt;/p&gt;
&lt;p&gt;Even though the SQL didn&amp;rsquo;t include the partition key, would adding it trigger partition pruning? The answer is: maybe not.&lt;/p&gt;
&lt;p&gt;Partition pruning improvements across versions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PG10 introduced declarative partitioning. There was no &lt;code&gt;enable_partition_pruning&lt;/code&gt; parameter; pruning was done at planning time via &lt;code&gt;constraint_exclusion&lt;/code&gt;. So PG10 had no query-execution-time pruning.&lt;/li&gt;
&lt;li&gt;PG11 added runtime partition pruning: &lt;a href="https://www.postgresql.org/docs/release/11.0/" target="_blank" rel="noreferrer"&gt;Allow partition elimination during query execution (David Rowley, Beena Emerson)&lt;/a&gt;. But it only supports pruning with bound variables, not non-immutable functions (including &lt;code&gt;now()&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.postgresql.org/docs/release/14.0/" target="_blank" rel="noreferrer"&gt;PG14&lt;/a&gt; added final pruning: &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c5b7ba4e6" target="_blank" rel="noreferrer"&gt;This wins in UPDATEs on partitioned tables when only some of the partitions will actually receive updates&lt;/a&gt;. i.e., supports pruning with non-immutable functions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since PG11 doesn&amp;rsquo;t support &lt;code&gt;now()&lt;/code&gt; pruning, adding a &lt;code&gt;now()&lt;/code&gt; condition to the business SQL wouldn&amp;rsquo;t trigger pruning — the error would still occur. However, if the business passed a bound variable, pruning would trigger and the error wouldn&amp;rsquo;t appear. Note: &amp;ldquo;the error wouldn&amp;rsquo;t appear&amp;rdquo; means updating 202512 data wouldn&amp;rsquo;t error out on the 202601 partition; updating 202601 data would still fail regardless.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Question 2: The partition was created on 2025-12-26, so why was the problem only discovered on December 30?
 &lt;div id="question-2-the-partition-was-created-on-2025-12-26-so-why-was-the-problem-only-discovered-on-december-30" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#question-2-the-partition-was-created-on-2025-12-26-so-why-was-the-problem-only-discovered-on-december-30" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;This is even simpler: &amp;ldquo;no replica identity + published + no primary key&amp;rdquo; is an AND condition.&lt;/p&gt;
&lt;p&gt;Although the new partitions were created early, they were published on the evening of December 29 at 20:47:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat postgresql&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;.csv.bak &lt;span style="color:#f92672"&gt;|&lt;/span&gt;grep &lt;span style="color:#e6db74"&gt;&amp;#34;alter publication&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;730&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;userlzlreplication&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,xxx&lt;span style="color:#e6db74"&gt;&amp;#34;statement: alter publication publzl add table &amp;#34;&amp;#34;public&amp;#34;&amp;#34;.&amp;#34;&amp;#34;tablzl_202601&amp;#34;&amp;#34;, &amp;#34;&amp;#34;public&amp;#34;&amp;#34;.&amp;#34;&amp;#34;tablzl_202602&amp;#34;&amp;#34;,...&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The first error appeared on December 29 at 22:26, about 1.5 hours later:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cat postgresql&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;.csv.bak &lt;span style="color:#f92672"&gt;|&lt;/span&gt;grep &lt;span style="color:#e6db74"&gt;&amp;#34;REPLICA IDENTITY&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;26&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;404&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;userlzlreplication&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;375121&lt;/span&gt;,xxx,&lt;span style="color:#e6db74"&gt;&amp;#34;cannot update table &amp;#34;&amp;#34;tablzl_202601&amp;#34;&amp;#34; because it does not have a replica identity and publishes updates&amp;#34;&lt;/span&gt;,,&lt;span style="color:#e6db74"&gt;&amp;#34;To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.&amp;#34;&lt;/span&gt;,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;UPDATE tablzl&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Root cause overview: The parent table had no primary key, so &lt;code&gt;partition of&lt;/code&gt; child partitions naturally also had none. Old child partitions had their primary keys added manually; new child partitions did not, resulting in the 202601 partition lacking a primary key. Logical replication relies on the primary key (default replica identity) for synchronization. Without replica identity, changes can&amp;rsquo;t be sent downstream, and UPDATE/DELETE statements on published tables cannot execute. In PG11, an UPDATE SQL that &lt;em&gt;does&lt;/em&gt; include the partition key condition may &lt;em&gt;still&lt;/em&gt; visit the new partition.&lt;/p&gt;
&lt;p&gt;A stroke of luck: Due to various factors, this problem was discovered early in this particular database. We had a one-day buffer on December 31 to fix all database instances, ensuring at least that January 1 new partition data updates wouldn&amp;rsquo;t error out. Otherwise, on January 1, 2026, multiple systems would have likely gone up in flames.&lt;/p&gt;
&lt;p&gt;Temporary measures (pick one):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Add primary keys to 2026 partitions&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;replica identity full&lt;/code&gt; on 2026 partitions&lt;/li&gt;
&lt;li&gt;Remove 2026 partitions from the publication&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For replication pipeline optimization:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tables without primary keys should be detected proactively, otherwise publishing them could cause business-side UPDATE failures&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For partition management strategy:&lt;/p&gt;
&lt;p&gt;PG&amp;rsquo;s partitioned tables are highly flexible, and developers generally don&amp;rsquo;t know how to create partitions correctly. Combined with significant new partitioning features across roughly PG10-15, and the lack of INTERVAL partitioning in PG, partitioned tables can end up a mess. Standardized management of partitioned tables is thus critical. For partition table features and operational tips, see: &lt;a href="https://blog.csdn.net/qq_40687433/article/details/132525655?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;PostgreSQL Partitioned Tables&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As for management tools, I&amp;rsquo;ll skip those.&lt;/p&gt;
&lt;p&gt;Management goals:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use the parent table structure as the standard: the parent table, being developer-facing, should have primary keys, indexes, and replica identity (unless the PG version doesn&amp;rsquo;t support it)&lt;/li&gt;
&lt;li&gt;Keep parent and child tables consistent; use &lt;code&gt;partition of&lt;/code&gt; to create new partitions (yes, I don&amp;rsquo;t recommend &lt;code&gt;attach&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Keep child tables consistent with each other&lt;/li&gt;
&lt;li&gt;Create new partitions in advance; partition data volumes should not be excessive&lt;/li&gt;
&lt;li&gt;Default partitions are not recommended; if created, their writes must be monitored&lt;/li&gt;
&lt;li&gt;Frequently accessed tables must have partition keys in their SQL queries and use partition pruning; otherwise, convert them to regular tables&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/release/10.0/" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/release/10.0/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/release/11.0/" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/release/11.0/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/release/12.0/" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/release/12.0/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/release/13.0/" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/release/13.0/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/release/14.0/" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/release/14.0/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;src/backend/executor/execReplication.c&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/132525655?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;PostgreSQL Partitioned Tables&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>Case Study: Row Locks and LWLock LockManager</title><link>https://lastdba.com/en/2025/12/21/case-study-row-locks-and-lwlock-lockmanager/</link><pubDate>Sun, 21 Dec 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/12/21/case-study-row-locks-and-lwlock-lockmanager/</guid><description>&lt;h2 class="relative group"&gt;Symptoms
 &lt;div id="symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The database showed a large number of row locks and a smaller number of LWLock LockManager waits. CPU was maxed out and active sessions spiked. The blocking PID associated with the locks kept changing, with no obvious long-transaction blocker.
(Imagine high CPU and active sessions.)&lt;/p&gt;
&lt;p&gt;The SQL corresponding to the large number of locks was as follows:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; lzl_record &lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt; rc_lzl1&lt;span style="color:#f92672"&gt;=&lt;/span&gt; rc_lzl1 &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, pc_lzl2 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; pc_lzl2 &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;, rc_lzl3 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; rc_lzl3 &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; lzl_id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Analysis
 &lt;div id="analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;No Increase in SQL Concurrency Observed
 &lt;div id="no-increase-in-sql-concurrency-observed" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#no-increase-in-sql-concurrency-observed" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;From the correlation between hits and CPU, we can analyze from the SQL hit perspective. That UPDATE SQL accounted for about 80% of activity. The SQL&amp;rsquo;s execution count had not changed, but &lt;code&gt;blks hit&lt;/code&gt; was clearly abnormal.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Symptoms
 &lt;div id="symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The database showed a large number of row locks and a smaller number of LWLock LockManager waits. CPU was maxed out and active sessions spiked. The blocking PID associated with the locks kept changing, with no obvious long-transaction blocker.
(Imagine high CPU and active sessions.)&lt;/p&gt;
&lt;p&gt;The SQL corresponding to the large number of locks was as follows:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; lzl_record &lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt; rc_lzl1&lt;span style="color:#f92672"&gt;=&lt;/span&gt; rc_lzl1 &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, pc_lzl2 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; pc_lzl2 &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;, rc_lzl3 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; rc_lzl3 &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; lzl_id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Analysis
 &lt;div id="analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;No Increase in SQL Concurrency Observed
 &lt;div id="no-increase-in-sql-concurrency-observed" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#no-increase-in-sql-concurrency-observed" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;From the correlation between hits and CPU, we can analyze from the SQL hit perspective. That UPDATE SQL accounted for about 80% of activity. The SQL&amp;rsquo;s execution count had not changed, but &lt;code&gt;blks hit&lt;/code&gt; was clearly abnormal.&lt;/p&gt;
&lt;p&gt;We also analyzed metadata access — within snapshots, no metadata tables showed unusually high access.&lt;/p&gt;
&lt;p&gt;From the symptom analysis, neither SQL concurrency increase nor metadata anomalies were apparent. The reason for the SQL hit increase wasn&amp;rsquo;t obvious at this point.&lt;/p&gt;

&lt;h3 class="relative group"&gt;LWLock LockManager Analysis
 &lt;div id="lwlock-lockmanager-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lwlock-lockmanager-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since the SQL itself is simple — the &lt;code&gt;lzl_id&lt;/code&gt; field in the &lt;code&gt;lzl_record&lt;/code&gt; table is a unique field, meaning the update is done by unique key.&lt;/p&gt;
&lt;p&gt;In addition to the large number of explicit locks, the wait events at the scene also included LWLock LockManager.&lt;/p&gt;
&lt;p&gt;However, the table is a regular table (not partitioned), with only 4 or 5 indexes on it.&lt;/p&gt;
&lt;p&gt;LWLock LockManager is related to not using the fast path. Simple queries and DML can use the fast path:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Weak relation locks. SELECT, INSERT, UPDATE, and DELETE must acquire a
lock on every relation they operate on, as well as various system catalogs
that can be used internally. Many DML operations can proceed in parallel
against the same table at the same time; only DDL operations such as
CLUSTER, ALTER TABLE, or DROP &amp;ndash; or explicit user action such as LOCK TABLE
&amp;ndash; will create lock conflicts with the &amp;ldquo;weak&amp;rdquo; locks (AccessShareLock,
RowShareLock, RowExclusiveLock) acquired by DML operations.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;So a SELECT/DML accessing no more than 16 relations (including indexes) should be able to use the fast path, and there shouldn&amp;rsquo;t be much LWLock LockManager.&lt;/p&gt;
&lt;p&gt;However, DML certainly can&amp;rsquo;t simply use the fast path — fast path handles lock operations entirely locally, but DML must check whether other sessions hold locks on the row and needs to access shared memory. Combined with the fact that this SQL updates by unique field yet still encounters row locks, it must be updating the same row.&lt;/p&gt;
&lt;p&gt;From the logs, we could see instances of updating the same row — one row had tens of thousands of lock-waiting updates.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Benchmark Testing
 &lt;div id="benchmark-testing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#benchmark-testing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Benchmarking Same-Row Updates to Reproduce LWLock LockManager
 &lt;div id="benchmarking-same-row-updates-to-reproduce-lwlock-lockmanager" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#benchmarking-same-row-updates-to-reproduce-lwlock-lockmanager" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Given that row locks definitely can&amp;rsquo;t rely solely on the fast path, and knowing that LWLock LockManager degrades database performance, we benchmarked different scenarios.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;#prompt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Give me a pgbench benchmark script
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Table structure: primary key, unique field + unique index, other fields
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Update: update by unique field
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Benchmark repeated updates on the same row (repeated row-lock updates)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Benchmark random updates on different rows (no row-lock updates)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Script omitted. Environment: 20 cores, 96GB RAM.&lt;/p&gt;
&lt;p&gt;pgbench commands:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pgbench -h localhost -p $PGPORT -d lzldb -U dbmgr -f update_same_unique_key.sql -c &lt;span style="color:#ae81ff"&gt;200&lt;/span&gt; -j &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; -T &lt;span style="color:#ae81ff"&gt;600&lt;/span&gt; -r -S
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pgbench -h localhost -p $PGPORT -d lzldb -U dbmgr -f update_random_unique_key.sql -c &lt;span style="color:#ae81ff"&gt;200&lt;/span&gt; -j &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; -T &lt;span style="color:#ae81ff"&gt;600&lt;/span&gt; -r -S&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Wait events during the benchmark:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- Update same row, 2 typical samples
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cnt 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+--------+---------------------+-----------------+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LockManager &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;105&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Lock&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tuple &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Lock&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALSync &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cnt 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+--------+---------------------+-----------------+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Lock&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;180&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LockManager &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tuple &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Lock&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALSync &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Update different rows, 2 typical samples
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cnt 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------------------+---------------------+-----------------+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;106&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; idle &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ClientRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Client &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; idle &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;transaction&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ClientRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Client &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; BufferMapping &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; idle &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;transaction&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; idle &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;transaction&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cnt 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------------------+---------------------+-----------------+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;117&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; idle &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ClientRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Client &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; idle &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;transaction&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ClientRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Client &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; XactGroupUpdate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IPC &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALSync &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; XactSLRU &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; BufferContent &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ClientRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Client &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From the wait events, the difference is clear: updating the same row produces LWLock LockManager, sometimes at a high proportion. Updating different rows mostly just waits on CPU. Scenario 1 matches the production situation.&lt;/p&gt;

&lt;h2 class="relative group"&gt;A Brief Analysis of Row Locks and Fast Path
 &lt;div id="a-brief-analysis-of-row-locks-and-fast-path" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#a-brief-analysis-of-row-locks-and-fast-path" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The lmgr README&amp;rsquo;s explanation of the fast path:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Fast Path Locking
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-----------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Fast path locking is a special purpose mechanism designed to reduce the
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;overhead of taking and releasing certain types of locks which are taken
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;and released very frequently but rarely conflict. Currently, this includes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;two categories of locks:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(1) Weak relation locks. SELECT, INSERT, UPDATE, and DELETE must acquire a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lock on every relation they operate on, as well as various system catalogs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;that can be used internally. Many DML operations can proceed in parallel
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;against the same table at the same time; only DDL operations such as
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CLUSTER, ALTER TABLE, or DROP -- or explicit user action such as LOCK TABLE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- will create lock conflicts with the &amp;#34;weak&amp;#34; locks (AccessShareLock,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;RowShareLock, RowExclusiveLock) acquired by DML operations.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Conditions for locks that can use the fast path, from &lt;code&gt;lmgr/lock.c&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * The fast-path lock mechanism is concerned only with relation locks on
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * unshared relations by backends bound to a database. The fast-path
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * mechanism exists mostly to accelerate acquisition and release of locks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * that rarely conflict. Because ShareUpdateExclusiveLock is
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * self-conflicting, it can&amp;#39;t use the fast-path mechanism; but it also does
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * not conflict with any of the locks that do, so we can ignore it completely.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define EligibleForRelationFastPath(locktag, mode) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	((locktag)-&amp;gt;locktag_lockmethodid == DEFAULT_LOCKMETHOD &amp;amp;&amp;amp; \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	(locktag)-&amp;gt;locktag_type == LOCKTAG_RELATION &amp;amp;&amp;amp; \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	(locktag)-&amp;gt;locktag_field1 == MyDatabaseId &amp;amp;&amp;amp; \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	MyDatabaseId != InvalidOid &amp;amp;&amp;amp; \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	(mode) &amp;lt; ShareUpdateExclusiveLock)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;SELECT/DML can use the fast path, but only for &lt;code&gt;locktype=relation&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s look at the actual lock situation when there&amp;rsquo;s a row lock:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;zzz&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;zzz&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; pid&lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_backend_pid()) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; pid,locktype;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; page &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tuple &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; classid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; objid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; objsubid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualtransaction &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; fastpath 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------+----------+----------+--------+--------+------------+---------------+---------+--------+----------+--------------------+--------+------------------+---------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4267681&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5290151&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;170706189&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;170706190&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;170706187&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tuple &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4267681&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5290151&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4267681&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5290151&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;562&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;253641&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;170706187&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;562&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;253641&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;562&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;562&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;253641&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;PG&amp;rsquo;s row lock implementation is quite complex — it involves not only tuple locks, but also transactionid and relation locks. Among these, only &lt;code&gt;locktype=relation&lt;/code&gt; and &lt;code&gt;virtualxid&lt;/code&gt; can use the fast path; all others cannot.&lt;/p&gt;
&lt;p&gt;Compare with the no-row-lock case:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;zzz&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;zzz&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; pid&lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_backend_pid()) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; pid,locktype;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; page &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tuple &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; classid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; objid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; objsubid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualtransaction &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; fastpath 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------+----------+----------+--------+--------+------------+---------------+---------+--------+----------+--------------------+--------+------------------+---------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4267681&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5290151&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4792&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4267681&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5290151&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4792&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;170706214&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4792&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4792&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4792&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4267681&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5290151&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;563&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;253641&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4267681&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5290151&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;563&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;253641&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;170706212&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;563&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;253641&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;563&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;563&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;253641&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;There are only 2-3 fewer &lt;code&gt;fastpath=f&lt;/code&gt; entries. The transactionid locks held by both sessions definitely can&amp;rsquo;t use the fast path.&lt;/p&gt;
&lt;p&gt;Summary of conditions for using the fast-path lock mechanism (all must be met):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lock level &amp;lt;= 3, i.e., SELECT/DML statements&lt;/li&gt;
&lt;li&gt;&lt;code&gt;locktype=relation&lt;/code&gt;. PG&amp;rsquo;s row locks also require at least transactionid and tuple locks, so these two can&amp;rsquo;t use the fast path&lt;/li&gt;
&lt;li&gt;Fewer than 16 relations accessed (typically exceeded only with full partition access on partitioned tables)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Conclusion
 &lt;div id="conclusion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#conclusion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Is the row lock the cause or the effect? Is it a row lock problem, or did database performance degrade causing SQL to run slower and produce row locks?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Row lock is the cause. The SQL execution count didn&amp;rsquo;t change, but the SQL parameters shifted from scattered to concentrated — i.e., updates to the same row noticeably increased. From the benchmark data, updating the same row produces row lock and LWLock LockManager waits.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;SQL execution count didn&amp;rsquo;t increase — did SQL performance degrade?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;SQL performance did degrade, but the index was definitely not chosen incorrectly — it was simply because the same row was being updated repeatedly.&lt;/p&gt;
&lt;p&gt;Solution:&lt;/p&gt;
&lt;p&gt;From the business side, the SQL was tied to a certain API endpoint: after being called, it updates the call count into the table. If the same endpoint is called repeatedly, it&amp;rsquo;s possible to repeatedly update the same row. Therefore, reducing repeated calls to the same endpoint, or batching the database updates into fewer, larger batches, is expected to mitigate this problem.&lt;/p&gt;</content:encoded></item><item><title>Case: From Inaccurate DISTINCT to the Principles of DISTINCT Estimation</title><link>https://lastdba.com/en/2025/10/19/case-from-inaccurate-distinct-to-the-principles-of-distinct-estimation/</link><pubDate>Sun, 19 Oct 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/10/19/case-from-inaccurate-distinct-to-the-principles-of-distinct-estimation/</guid><description>&lt;h2 class="relative group"&gt;Problem Description
 &lt;div id="problem-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;n_distinct&lt;/code&gt; statistic was severely inaccurate.&lt;/p&gt;
&lt;p&gt;This problem appeared across multiple databases. For example:&lt;/p&gt;
&lt;p&gt;A table with 200 million rows and a true DISTINCT count of 8 million had a statistics DISTINCT value of only 40,000.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Analysis
 &lt;div id="analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Sampling Model
 &lt;div id="sampling-model" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sampling-model" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7e0b33a60cf4.png" alt="Does the standby have its own statistics? · PostgreSQL Apprentice" /&gt;&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Description
 &lt;div id="problem-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;n_distinct&lt;/code&gt; statistic was severely inaccurate.&lt;/p&gt;
&lt;p&gt;This problem appeared across multiple databases. For example:&lt;/p&gt;
&lt;p&gt;A table with 200 million rows and a true DISTINCT count of 8 million had a statistics DISTINCT value of only 40,000.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Analysis
 &lt;div id="analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Sampling Model
 &lt;div id="sampling-model" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sampling-model" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7e0b33a60cf4.png" alt="Does the standby have its own statistics? · PostgreSQL Apprentice" /&gt;&lt;/p&gt;
&lt;p&gt;The default &lt;code&gt;default_statistics_target=100&lt;/code&gt; means 30,000 rows are sampled from 30,000 pages.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt; tablzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INFO: &lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;: analyzing &lt;span style="color:#e6db74"&gt;&amp;#34;public.tablzl1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: do_analyze_rel, &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;332&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INFO: &lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;tablzl1&amp;#34;&lt;/span&gt;: scanned &lt;span style="color:#ae81ff"&gt;30000&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22963751&lt;/span&gt; pages, containing &lt;span style="color:#ae81ff"&gt;1061942&lt;/span&gt; live &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3953&lt;/span&gt; dead &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;30000&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; sample, &lt;span style="color:#ae81ff"&gt;812872389&lt;/span&gt; estimated total &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: acquire_sample_rows, &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1340&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note &amp;ldquo;scanned 30000&amp;rdquo; and &amp;ldquo;30000 rows in sample&amp;rdquo;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;DISTINCT Estimation Algorithm
 &lt;div id="distinct-estimation-algorithm" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#distinct-estimation-algorithm" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The DISTINCT estimation algorithm in &lt;code&gt;analyze.c&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * Estimate the number of distinct values using the estimator
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * proposed by Haas and Stokes in IBM Research Report RJ 10025:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 *		n*d / (n - f1 + f1*n/N)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * where f1 is the number of distinct values that occurred
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * exactly once in our sample of n rows (from a total of N),
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * and d is the total number of distinct values in the sample.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * This is their Duj1 estimator; the other estimators they
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * recommend are considerably more complex, and are numerically
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * very unstable when n is much smaller than N.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * In this calculation, we consider only non-nulls. We used to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * include rows with null values in the n and N counts, but that
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * leads to inaccurate answers in columns with many nulls, and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * it&amp;#39;s intuitively bogus anyway considering the desired result is
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * the number of distinct non-null values.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * We assume (not very reliably!) that all the multiply-occurring
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * values are reflected in the final track[] list, and the other
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * nonnull values all appeared but once. (XXX this usually
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * results in a drastic overestimate of ndistinct. Can we do
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * any better?)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 *----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			f1 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; nonnull_cnt &lt;span style="color:#f92672"&gt;-&lt;/span&gt; summultiple;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			d &lt;span style="color:#f92672"&gt;=&lt;/span&gt; f1 &lt;span style="color:#f92672"&gt;+&lt;/span&gt; nmultiple;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;double&lt;/span&gt;		n &lt;span style="color:#f92672"&gt;=&lt;/span&gt; samplerows &lt;span style="color:#f92672"&gt;-&lt;/span&gt; null_cnt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;double&lt;/span&gt;		N &lt;span style="color:#f92672"&gt;=&lt;/span&gt; totalrows &lt;span style="color:#f92672"&gt;*&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1.0&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt; stats&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;stanullfrac);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;double&lt;/span&gt;		stadistinct;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;n*d / (n - f1 + f1*n/N)&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;n&lt;/code&gt; = number of sample rows (rows scanned)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;d&lt;/code&gt; = number of distinct values found in the sample&lt;/li&gt;
&lt;li&gt;&lt;code&gt;f1&lt;/code&gt; = number of values appearing exactly once in the sample&lt;/li&gt;
&lt;li&gt;&lt;code&gt;N&lt;/code&gt; = total number of rows in the table&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Algorithm paper: &lt;a href="https://hugepdf.com/download/download-extended-version-of-this-paper_pdf" target="_blank" rel="noreferrer"&gt;https://hugepdf.com/download/download-extended-version-of-this-paper_pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The paper is rather dense, so let&amp;rsquo;s work through some assumptions to understand this DISTINCT algorithm:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Assume all values appear exactly once, and the table is large (n &amp;laquo; N), so f1 = d, n/N ≈ 0&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;d*d / (d - d + d*0) = d²/0&lt;/code&gt; — this would evaluate to -1.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Assume all values appear exactly once, and the table is small (n = N), so f1 = d, n/N = 1&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;n*d / (n - d + d*1) = d&lt;/code&gt; — the sampled distinct count, which equals the number of sampled rows.&lt;/p&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Assume no values appear exactly once in the sample, i.e., f1 = 0&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;n*d / (n - f1 + f1*n/N) = n*d / n = d&lt;/code&gt; — just the distinct count in the sample.&lt;/p&gt;
&lt;p&gt;If a column is populated by inserting several rows of the same value, then several rows of another value, like:&lt;/p&gt;
&lt;p&gt;11, 2, 2, 2, 2, 3, 3, 3, &amp;hellip;&lt;/p&gt;
&lt;p&gt;3.1 Small table, all 30,000 rows sampled, true distinct = 10,000 (assumed): estimated distinct = d = 10,000&lt;/p&gt;
&lt;p&gt;3.2 Large table, sample contains both repeating values and singletons (some repeating values only have one row captured), i.e., n = 30,000, n/N ≈ 0&lt;/p&gt;
&lt;p&gt;&lt;code&gt;n*d / (n - f1 + f1*n/N) = n*d / (n - f1) = 30000*d/(30000-f1)&lt;/code&gt; — the larger the distinct count in the sample, the larger the estimated distinct; the larger the number of singletons, the larger the estimated distinct.&lt;/p&gt;
&lt;p&gt;Summary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;DISTINCT estimation is directly related to the distinct count and singleton count in the sample&lt;/li&gt;
&lt;li&gt;If the singleton count = 0, then larger samples yield larger estimated distinct values&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Verification
 &lt;div id="verification" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#verification" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since the default maximum sample size is 30,000 rows, for tables larger than this, the estimator is likely to underestimate DISTINCT. Note: the data should not have too many unique values.&lt;/p&gt;
&lt;p&gt;Testing a table with different sample sizes:&lt;/p&gt;
&lt;p&gt;Table: reltuples = 800 million, relpages = 20 million, size = 175GB, true column distinct = 100 million&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;target statistics&lt;/th&gt;
 &lt;th&gt;pages sampling ratio (approx)&lt;/th&gt;
 &lt;th&gt;tuples sampling ratio (approx)&lt;/th&gt;
 &lt;th&gt;n_distinct&lt;/th&gt;
 &lt;th&gt;execution time&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;50&lt;/td&gt;
 &lt;td&gt;0.00075&lt;/td&gt;
 &lt;td&gt;0.00001875&lt;/td&gt;
 &lt;td&gt;60k&lt;/td&gt;
 &lt;td&gt;2s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;100&lt;/td&gt;
 &lt;td&gt;0.0015&lt;/td&gt;
 &lt;td&gt;0.0000375&lt;/td&gt;
 &lt;td&gt;110k&lt;/td&gt;
 &lt;td&gt;5s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;1000&lt;/td&gt;
 &lt;td&gt;0.015&lt;/td&gt;
 &lt;td&gt;0.000375&lt;/td&gt;
 &lt;td&gt;1.03M&lt;/td&gt;
 &lt;td&gt;58s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3000&lt;/td&gt;
 &lt;td&gt;0.045&lt;/td&gt;
 &lt;td&gt;0.001125&lt;/td&gt;
 &lt;td&gt;2.68M&lt;/td&gt;
 &lt;td&gt;3min 1s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;10000&lt;/td&gt;
 &lt;td&gt;0.15&lt;/td&gt;
 &lt;td&gt;0.00375&lt;/td&gt;
 &lt;td&gt;6.75M&lt;/td&gt;
 &lt;td&gt;7min 21s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;(maximum target statistics is 10000)&lt;/p&gt;
&lt;p&gt;A rough conclusion: n_distinct and ANALYZE execution time grow proportionally with the sample size.&lt;/p&gt;
&lt;p&gt;n_distinct grows with sample size, while pages and tuples estimates remain consistently accurate.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Solution
 &lt;div id="solution" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#solution" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;For extremely large tables, consider partitioning or optimizing based on actual SQL patterns.&lt;/p&gt;
&lt;p&gt;You can also adjust the statistics target. The default &lt;code&gt;default_statistics_target=100&lt;/code&gt; means 30,000 rows from 30,000 pages.&lt;/p&gt;
&lt;p&gt;Temporary fix:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; default_statistics_target&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3000&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; tab1;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Long-term fix:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab1 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;STATISTICS&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3000&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Column-level statistics target has the highest priority, overriding &lt;code&gt;default_statistics_target&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Maximum statistics target is 10000&lt;/li&gt;
&lt;li&gt;The table&amp;rsquo;s sampling target is determined by the maximum column target:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Determine how many rows we need to sample, using the worst case from
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * all analyzable columns. We use a lower bound of 100 rows to avoid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * possible overflow in Vitter&amp;#39;s algorithm. (Note: that will also be the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * target in the corner case where there are no analyzable columns.)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	targrows &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (i &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; i &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; attr_cnt; i&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (targrows &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; vacattrstats[i]&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;minrows)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			targrows &lt;span style="color:#f92672"&gt;=&lt;/span&gt; vacattrstats[i]&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;minrows;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (ind &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; ind &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; nindexes; ind&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		AnlIndexData &lt;span style="color:#f92672"&gt;*&lt;/span&gt;thisdata &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;indexdata[ind];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (i &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; i &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; thisdata&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;attr_cnt; i&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (targrows &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; thisdata&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;vacattrstats[i]&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;minrows)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				targrows &lt;span style="color:#f92672"&gt;=&lt;/span&gt; thisdata&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;vacattrstats[i]&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;minrows;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If ANALYZE collects more or fewer rows than expected, check &lt;code&gt;pg_statistic&lt;/code&gt; for per-column &lt;code&gt;stattarget&lt;/code&gt; settings:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; attrelid::regclass,attname,attstattarget &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_attribute &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; attrelid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;tab1&amp;#39;&lt;/span&gt;::regclass &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; attstattarget &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;For large tables where columns are non-unique but have high distinct counts (a realistic scenario), the sampling algorithm underestimates the DISTINCT value, and this is positively correlated with the sampling ratio. The default sampling ratio is too small for large tables. You can increase it, but even the maximum is not that large.&lt;/p&gt;</content:encoded></item><item><title>Case Study: Performance Degradation After Adding an Index and the Generic Plan</title><link>https://lastdba.com/en/2025/09/13/case-study-performance-degradation-after-adding-an-index-and-the-generic-plan/</link><pubDate>Sat, 13 Sep 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/09/13/case-study-performance-degradation-after-adding-an-index-and-the-generic-plan/</guid><description>&lt;h2 class="relative group"&gt;Problem Description
 &lt;div id="problem-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;An index was added the night before, and the next morning the CPU was maxed out. The problematic SQL was easy to locate — just one query. The SQL was running for over 30 seconds, but the day before it only took about 3 seconds, so we needed to examine the before-and-after execution plan changes.&lt;/p&gt;
&lt;p&gt;Only the key parts of the execution plan are shown below.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Description
 &lt;div id="problem-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;An index was added the night before, and the next morning the CPU was maxed out. The problematic SQL was easy to locate — just one query. The SQL was running for over 30 seconds, but the day before it only took about 3 seconds, so we needed to examine the before-and-after execution plan changes.&lt;/p&gt;
&lt;p&gt;Only the key parts of the execution plan are shown below.&lt;/p&gt;
&lt;p&gt;Execution plan before adding the index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Nested Loop (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;92&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2259694&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;265822&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; uk_lzl_task &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_task t (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;20007&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;99&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;195&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_by)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LIUZHILONG62&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;11337&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;14842&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202501 cc_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3053&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1467&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202501_task_no_idx (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1594&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202502 cc_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;67&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3066&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1604&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202502_task_no_idx (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1605&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_202503_task_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202503 cc_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1362&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;61&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1637&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_202504_task_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202504 cc_4 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;604&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;64&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1795&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_202505_task_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202505 cc_5 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;445&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1450&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_202506_task_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202506 cc_6 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;583&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;94&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1675&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_202507_task_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202507 cc_7 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;633&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1973&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_202508_task_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202508 cc_8 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;619&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1720&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_202509_task_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202509 cc_9 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;893&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1521&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The created_date time range searches for data within 1 year. The index added the night before was on created_date.&lt;/p&gt;
&lt;p&gt;Execution plan after adding the index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;23740&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;82&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;191&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: ((cc.task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;23376&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;98&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;114435&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Subplans Removed: &lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202501_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202501 cc_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1450&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8958&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202502_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202502 cc_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1822&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;73&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7405&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202503_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202503 cc_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1430&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7917&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202504_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202504 cc_4 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2412&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11041&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202505_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202505 cc_5 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2260&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;73&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;13381&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202506_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202506 cc_6 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3930&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17832&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202507_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202507 cc_7 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3878&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;77&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;21786&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202508_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202508 cc_8 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;4736&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;72&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;22033&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202509_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202509 cc_9 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;627&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1893&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; ai_outbound_call_task t (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;99&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((created_by)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; idx_ai_call_task_c (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;99&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_by)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;)::text)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The new execution plan switched from using the task_no index to using the created_date index, and changed from a Nested Loop to a Hash Join. The cost dropped from 2,259,694 to 23,740 — a 100x reduction. However, the actual execution time increased by roughly 10x.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Problem Diagnosis
 &lt;div id="problem-diagnosis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-diagnosis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s work through three questions to analyze and diagnose the issue:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Why did the optimizer suggest the created_date index?&lt;/li&gt;
&lt;li&gt;Why did it end up using the new index?&lt;/li&gt;
&lt;li&gt;Why is the estimated row count very small even though the actual execution time is very long?&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;Why Did the Optimizer Suggest the created_date Index?
 &lt;div id="why-did-the-optimizer-suggest-the-created_date-index" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-did-the-optimizer-suggest-the-created_date-index" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If we directly substitute the parameters from the PostgreSQL log into the SQL text, the execution plan is actually the good one — the one that runs in 3 seconds using the task_no index. The optimization engineer also ran it this way and found it to be fine. But in production, this wasn&amp;rsquo;t the execution plan that was used.&lt;/p&gt;
&lt;p&gt;Even when we force PostgreSQL &lt;em&gt;not&lt;/em&gt; to use the task_no index, the optimizer chooses a sequential scan rather than the created_date index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (((cc.task_no)::text &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2794425&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;22238757&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202501 cc_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;193060&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1585238&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-08 11:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-04 08:31:43&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202502 cc_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;178567&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;54&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1480969&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-08 11:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-04 08:31:43&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202503 cc_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;191073&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1583356&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This is very strange: no matter how we ran it ourselves, we couldn&amp;rsquo;t get it to use the bad created_date index. So how did production end up using it?&lt;/p&gt;
&lt;p&gt;The answer lies in bind variables — it was likely a &lt;strong&gt;generic plan&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Characteristics of the generic plan:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When &lt;code&gt;plan_cache_mode = auto&lt;/code&gt;, PostgreSQL compares the generic plan cost against the average cost of the first five hard parses (custom plans). If the generic plan has a lower cost, it is used and subsequent executions skip hard parsing; otherwise, every execution undergoes hard parsing (see the source function &lt;code&gt;choose_custom_plan&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;What the generic plan looks like has nothing to do with the actual bind variable values.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is easy to reproduce using bind variables via PREPARE/EXECUTE:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; sql1(&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;,text) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COUNT&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xxxxxxx...;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; sql1(&lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-08 11:00:00&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-04 08:31:43&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;LIUZHILONG62&amp;#39;&lt;/span&gt;); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;12016&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;367&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;220&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; sql1(&lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-08 11:00:00&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-04 08:31:43&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;LIUZHILONG62&amp;#39;&lt;/span&gt;); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;12016&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;254&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;386&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; sql1(&lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-08 11:00:00&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-04 08:31:43&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;LIUZHILONG62&amp;#39;&lt;/span&gt;); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;12016&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;235&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;343&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; sql1(&lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-08 11:00:00&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-04 08:31:43&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;LIUZHILONG62&amp;#39;&lt;/span&gt;); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;12016&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;234&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;110&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; sql1(&lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-08 11:00:00&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-04 08:31:43&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;LIUZHILONG62&amp;#39;&lt;/span&gt;); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;12016&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;233&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;570&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; sql1(&lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-08 11:00:00&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-04 08:31:43&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;LIUZHILONG62&amp;#39;&lt;/span&gt;); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;12016&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;70678&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;344&lt;/span&gt; ms (&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;678&lt;/span&gt;) &lt;span style="color:#75715e"&gt;-- 6th execution is significantly slower
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_prepared_statements&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx &lt;span style="color:#75715e"&gt;-- pg14 supports pg_prepared_statements
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;generic_plans &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;custom_plans &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The first 5 hard parses (custom plans) all executed quickly. The 6th execution used the generic plan, which used the created_date index — this was the exact production failure plan, which was extremely slow.&lt;/p&gt;
&lt;p&gt;So while the optimization suggestion to use the created_date index was somewhat problematic, when you substituted bind variables with actual values and ran EXPLAIN, the execution plan was correct. In production, however, the application used bind variables, and the generic plan kicked in — causing the failure.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Why Is the Estimated Row Count Small But the Actual Execution Time Very Long?
 &lt;div id="why-is-the-estimated-row-count-small-but-the-actual-execution-time-very-long" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-is-the-estimated-row-count-small-but-the-actual-execution-time-very-long" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The failing execution plan has a problem: the estimated cost is too small, and the estimated rows are too few.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202501_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202501 cc_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1450&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8958&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From a business logic perspective, this looks abnormal. The created_date condition spans multiple partitions, and since created_date is the partition key, &lt;code&gt;WHERE created_date &amp;gt;= xx AND &amp;lt;= yy&lt;/code&gt; must be contiguous. The selectivity on a sub-partition should always be 1, meaning rows should equal the sub-partition row count — several million, not several thousand.&lt;/p&gt;
&lt;p&gt;At first I thought it was a statistics issue, but the statistics were fairly accurate — the historical partition data for 202501 hadn&amp;rsquo;t changed.&lt;/p&gt;
&lt;p&gt;Since this is a generic plan issue, we need to examine the generic plan cost estimation by reading the source code. Cost estimation is more complex, but rows estimation is relatively easier to understand and locate.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;double&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;calc_rangesel&lt;/span&gt;(TypeCacheEntry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;typcache, VariableStatData &lt;span style="color:#f92672"&gt;*&lt;/span&gt;vardata,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 &lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; RangeType &lt;span style="color:#f92672"&gt;*&lt;/span&gt;constval, Oid operator)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* with any other operator, empty Op non-empty matches nothing */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			selec &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1.0&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt; empty_frac) &lt;span style="color:#f92672"&gt;*&lt;/span&gt; hist_selec;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* all range operators are strict */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	selec &lt;span style="color:#f92672"&gt;*=&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1.0&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt; null_frac);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;range_select = (1 - null_frac) * histogram_selectivity&lt;/code&gt;. The range histogram selectivity looks at the histogram buckets hit by the range plus any matching MCV entries. However, we don&amp;rsquo;t need to compute all this for this case.&lt;/p&gt;
&lt;p&gt;Because the generic plan does not look at the histogram:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * rangesel -- restriction selectivity for range operators
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Datum
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;rangesel&lt;/span&gt;(PG_FUNCTION_ARGS)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If we got a valid constant on one side of the operator, proceed to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * estimate using statistics. Otherwise punt and return a default constant
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * estimate. Note that calc_rangesel need not handle
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * OID_RANGE_ELEM_CONTAINED_OP.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (constrange)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		selec &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;calc_rangesel&lt;/span&gt;(typcache, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;vardata, constrange, operator);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		selec &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;default_range_selectivity&lt;/span&gt;(operator);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;calc_rangesel&lt;/code&gt; is the selectivity calculation function that takes constant values (used above). The &lt;code&gt;else&lt;/code&gt; branch calls &lt;code&gt;default_range_selectivity&lt;/code&gt;, which does not pass any constants.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Returns a default selectivity estimate for given operator, when we don&amp;#39;t
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * have statistics or cannot use them for some reason.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;double&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;default_range_selectivity&lt;/span&gt;(Oid operator)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (operator)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; OID_RANGE_CONTAINS_ELEM_OP:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; OID_RANGE_ELEM_CONTAINED_OP:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * &amp;#34;range @&amp;gt; elem&amp;#34; is more or less identical to a scalar
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * inequality &amp;#34;A &amp;gt;= b AND A &amp;lt;= c&amp;#34;.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; DEFAULT_RANGE_INEQ_SEL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The default range selectivity define:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* default selectivity estimate for range inequalities &amp;#34;A &amp;gt; b AND A &amp;lt; c&amp;#34; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define DEFAULT_RANGE_INEQ_SEL	0.005&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s verify this against the production row estimate:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; reltuples::bigint&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;005&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab_202501&amp;#39;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;column&lt;/span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8958&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;350&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This matches the actual estimated rows of 8958:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;idx_lzltab_202501_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202501 cc_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1450&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8958&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So the new execution plan&amp;rsquo;s inaccurate estimate is because the generic plan uses a default selectivity of 0.005.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Why Does the Generic Plan Exist, and the Problem with Soft Parsing
 &lt;div id="why-does-the-generic-plan-exist-and-the-problem-with-soft-parsing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-does-the-generic-plan-exist-and-the-problem-with-soft-parsing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;It&amp;rsquo;s easier to think of the generic plan as a &amp;ldquo;DEFAULT estimate plan.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Why does the generic plan always seem to have problems?&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s trace the reasoning chain:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The generic plan exists to reduce hard parsing, i.e., to enable soft parsing.&lt;/li&gt;
&lt;li&gt;If we don&amp;rsquo;t hard-parse every execution, we can reuse an execution plan without passing specific parameter values.&lt;/li&gt;
&lt;li&gt;If we don&amp;rsquo;t pass parameters and directly use an execution plan, that plan must be generated in advance.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ways to generate an execution plan in advance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A parameter-less execution plan (the generic plan)&lt;/li&gt;
&lt;li&gt;Reuse an execution plan generated from the first few executions with parameters (PostgreSQL doesn&amp;rsquo;t have this)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If we use a generic plan, it can be inaccurate, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;ol&gt;
&lt;li&gt;Data skew (e.g., a particular MCV has a very high frequency, like &lt;code&gt;WHERE a = 1&lt;/code&gt; but &lt;code&gt;a = 1&lt;/code&gt; appears extremely often). This heavily depends on what the parameter value actually is, but the generic plan receives no parameters, so the plan cannot be accurate.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Evenly distributed data where selectivity still cannot be accurately calculated (e.g., &lt;code&gt;WHERE a &amp;gt; $1 AND a &amp;lt; $2&lt;/code&gt;). Without knowing the range, no one can compute the selectivity. The generic plan receives no parameters, so the plan cannot be accurate.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If we reused plans from the first few parameterized executions (which PostgreSQL doesn&amp;rsquo;t do), they could also be inaccurate:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data skew: the first few parameter values may not be representative, and they would heavily influence what the subsequent fixed plan looks like.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Categories of Generic Plan Estimation Problems
 &lt;div id="categories-of-generic-plan-estimation-problems" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#categories-of-generic-plan-estimation-problems" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Because the comparison requires 5 custom plans first, generic plan problems can be divided into two categories:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The first 5 SQL executions are not representative. This is closely tied to the first 5 execution plans and depends on data skew and whether the first 5 parameter values are representative.&lt;/li&gt;
&lt;li&gt;The generic plan itself is problematic. Due to data skew or the inability to accurately compute selectivity for evenly distributed data, the generic plan itself is inefficient.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;Optimization Recommendations
 &lt;div id="optimization-recommendations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#optimization-recommendations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Based on this case, generic plan issues can appear on partitioned tables. The partition key is contiguous, and selectivity when scanning all partitions should be 1, but the generic plan uses 0.005, which can easily lead to a &amp;ldquo;full index scan&amp;rdquo; scenario.&lt;/p&gt;
&lt;p&gt;So during optimization, we need to consider more:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Avoid creating too many indexes that confuse the optimizer.&lt;/li&gt;
&lt;li&gt;Eliminate generic plan interference. Use &lt;code&gt;EXECUTE&lt;/code&gt; to truly run the query 6 times.&lt;/li&gt;
&lt;li&gt;At the session level, set &lt;code&gt;plan_cache_mode = 'force_generic_plan'&lt;/code&gt; or &lt;code&gt;set plan_cache_mode = 'force_custom_plan'&lt;/code&gt; to compare execution plans. Or, on pg16+, use &lt;code&gt;EXPLAIN (GENERIC_PLAN)&lt;/code&gt; to compare.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Syntax reference:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--prepare/excute
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; sql1(text) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COUNT&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; LZL &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; sql1(&lt;span style="color:#e6db74"&gt;&amp;#39;zzz&amp;#39;&lt;/span&gt;); &lt;span style="color:#75715e"&gt;-- run 6 times first
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;EXPLAIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; sql1(&lt;span style="color:#e6db74"&gt;&amp;#39;zzz&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_prepared_statements &lt;span style="color:#75715e"&gt;-- view prepared statement info, current session only
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Compare execution plans by setting session parameters before EXPLAIN EXECUTE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; plan_cache_mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;force_generic_plan&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; plan_cache_mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;force_custom_plan&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Directly view generic plan, pg16+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (GENERIC_PLAN) xx &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;</content:encoded></item><item><title>Case: GRANT Authorization Causes Walsender to Hang</title><link>https://lastdba.com/en/2025/06/26/case-grant-authorization-causes-walsender-to-hang/</link><pubDate>Thu, 26 Jun 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/06/26/case-grant-authorization-causes-walsender-to-hang/</guid><description>&lt;h2 class="relative group"&gt;Symptoms
 &lt;div id="symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The walsender&amp;rsquo;s LSN stopped advancing. The stack trace showed it was stuck in pathman&amp;rsquo;s &lt;code&gt;invalidate_psin_entries_using_relid&lt;/code&gt;, with the relid constantly changing and the walsender CPU pegged at 100%.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pstack &lt;span style="color:#ae81ff"&gt;121327&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 hash_seq_search (status=status@entry=0x7fffaadf8330) at dynahash.c:1441
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x00002ba3b40ec728 in invalidate_psin_entries_using_relid (relid=relid@entry=42319501) at src/relation_info.c:251
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x00002ba3b40ecb3d in forget_status_of_relation (relid=relid@entry=42319501) at src/relation_info.c:232
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x00002ba3b40fcc96 in pathman_relcache_hook (arg=&amp;lt;optimized out&amp;gt;, relid=42319501) at src/hooks.c:934
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x000000000087168a in LocalExecuteInvalidationMessage (msg=0x3a391c8) at inval.c:595
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#5 0x000000000071d50e in ReorderBufferExecuteInvalidations (rb=0x1b63ff8, txn=0x1be5f58, txn=0x1be5f58) at reorderbuffer.c:2238
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#6 ReorderBufferCommit (rb=0x1b63ff8, xid=xid@entry=4285897514, commit_lsn=405674661986920, end_lsn=&amp;lt;optimized out&amp;gt;, commit_time=commit_time@entry=799377897828299, origin_id=origin_id@entry=0, origin_lsn=origin_lsn@entry=0) at reorderbuffer.c:1819
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#7 0x0000000000712d18 in DecodeCommit (xid=4285897514, parsed=0x7fffaadf8630, buf=0x7fffaadf87f0, ctx=0x1a359e8) at decode.c:637
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#8 DecodeXactOp (ctx=0x1a359e8, buf=buf@entry=0x7fffaadf87f0) at decode.c:245
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#9 0x00000000007130b2 in LogicalDecodingProcessRecord (ctx=0x1a359e8, record=0x1a35c80) at decode.c:114
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#10 0x0000000000733662 in XLogSendLogical () at walsender.c:2885
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#11 0x0000000000735942 in WalSndLoop (send_data=send_data@entry=0x733620 &amp;lt;XLogSendLogical&amp;gt;) at walsender.c:2287
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#12 0x0000000000736692 in StartLogicalReplication (cmd=0x1846c68) at walsender.c:1213
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#13 exec_replication_command (cmd_string=cmd_string@entry=0x181a288 &amp;#34;START_REPLICATION SLOT \&amp;#34;lzl_logical_rep\&amp;#34; LOGICAL 170F5/7C3EAE78 (\&amp;#34;proto_version\&amp;#34; &amp;#39;1&amp;#39;, \&amp;#34;publication_names\&amp;#34; &amp;#39;lzl_logical_rep&amp;#39;)&amp;#34;) at walsender.c:1640
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#14 0x0000000000774e91 in PostgresMain (argc=&amp;lt;optimized out&amp;gt;, argv=argv@entry=0x1866478, dbname=0x18662b8 &amp;#34;lzldb&amp;#34;, username=&amp;lt;optimized out&amp;gt;) at postgres.c:4325
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#15 0x0000000000485989 in BackendRun (port=&amp;lt;optimized out&amp;gt;, port=&amp;lt;optimized out&amp;gt;) at postmaster.c:4526
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#16 BackendStartup (port=0x18635b0) at postmaster.c:4210
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#17 ServerLoop () at postmaster.c:1739
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#18 0x0000000000702f08 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1814da0) at postmaster.c:1412
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#19 0x000000000048660a in main (argc=3, argv=0x1814da0) at main.c:210
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Second execution, same stack, different relid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pstack &lt;span style="color:#ae81ff"&gt;121327&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 hash_seq_search (status=status@entry=0x7fffaadf8330) at dynahash.c:1441
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x00002ba3b40ec728 in invalidate_psin_entries_using_relid (relid=relid@entry=26560221) at src/relation_info.c:251
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x00002ba3b40ecb3d in forget_status_of_relation (relid=relid@entry=26560221) at src/relation_info.c:232
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x00002ba3b40fcc96 in pathman_relcache_hook (arg=&amp;lt;optimized out&amp;gt;, relid=26560221) at src/hooks.c:934
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x000000000087168a in LocalExecuteInvalidationMessage (msg=0x39f1f68) at inval.c:595
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Analysis
 &lt;div id="analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The changing relid showed that the walsender was still running, not dead. The LSN was not advancing, so we analyzed the LSN position to see what the transaction was doing.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Symptoms
 &lt;div id="symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The walsender&amp;rsquo;s LSN stopped advancing. The stack trace showed it was stuck in pathman&amp;rsquo;s &lt;code&gt;invalidate_psin_entries_using_relid&lt;/code&gt;, with the relid constantly changing and the walsender CPU pegged at 100%.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pstack &lt;span style="color:#ae81ff"&gt;121327&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 hash_seq_search (status=status@entry=0x7fffaadf8330) at dynahash.c:1441
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x00002ba3b40ec728 in invalidate_psin_entries_using_relid (relid=relid@entry=42319501) at src/relation_info.c:251
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x00002ba3b40ecb3d in forget_status_of_relation (relid=relid@entry=42319501) at src/relation_info.c:232
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x00002ba3b40fcc96 in pathman_relcache_hook (arg=&amp;lt;optimized out&amp;gt;, relid=42319501) at src/hooks.c:934
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x000000000087168a in LocalExecuteInvalidationMessage (msg=0x3a391c8) at inval.c:595
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#5 0x000000000071d50e in ReorderBufferExecuteInvalidations (rb=0x1b63ff8, txn=0x1be5f58, txn=0x1be5f58) at reorderbuffer.c:2238
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#6 ReorderBufferCommit (rb=0x1b63ff8, xid=xid@entry=4285897514, commit_lsn=405674661986920, end_lsn=&amp;lt;optimized out&amp;gt;, commit_time=commit_time@entry=799377897828299, origin_id=origin_id@entry=0, origin_lsn=origin_lsn@entry=0) at reorderbuffer.c:1819
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#7 0x0000000000712d18 in DecodeCommit (xid=4285897514, parsed=0x7fffaadf8630, buf=0x7fffaadf87f0, ctx=0x1a359e8) at decode.c:637
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#8 DecodeXactOp (ctx=0x1a359e8, buf=buf@entry=0x7fffaadf87f0) at decode.c:245
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#9 0x00000000007130b2 in LogicalDecodingProcessRecord (ctx=0x1a359e8, record=0x1a35c80) at decode.c:114
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#10 0x0000000000733662 in XLogSendLogical () at walsender.c:2885
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#11 0x0000000000735942 in WalSndLoop (send_data=send_data@entry=0x733620 &amp;lt;XLogSendLogical&amp;gt;) at walsender.c:2287
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#12 0x0000000000736692 in StartLogicalReplication (cmd=0x1846c68) at walsender.c:1213
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#13 exec_replication_command (cmd_string=cmd_string@entry=0x181a288 &amp;#34;START_REPLICATION SLOT \&amp;#34;lzl_logical_rep\&amp;#34; LOGICAL 170F5/7C3EAE78 (\&amp;#34;proto_version\&amp;#34; &amp;#39;1&amp;#39;, \&amp;#34;publication_names\&amp;#34; &amp;#39;lzl_logical_rep&amp;#39;)&amp;#34;) at walsender.c:1640
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#14 0x0000000000774e91 in PostgresMain (argc=&amp;lt;optimized out&amp;gt;, argv=argv@entry=0x1866478, dbname=0x18662b8 &amp;#34;lzldb&amp;#34;, username=&amp;lt;optimized out&amp;gt;) at postgres.c:4325
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#15 0x0000000000485989 in BackendRun (port=&amp;lt;optimized out&amp;gt;, port=&amp;lt;optimized out&amp;gt;) at postmaster.c:4526
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#16 BackendStartup (port=0x18635b0) at postmaster.c:4210
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#17 ServerLoop () at postmaster.c:1739
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#18 0x0000000000702f08 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1814da0) at postmaster.c:1412
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#19 0x000000000048660a in main (argc=3, argv=0x1814da0) at main.c:210
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Second execution, same stack, different relid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pstack &lt;span style="color:#ae81ff"&gt;121327&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 hash_seq_search (status=status@entry=0x7fffaadf8330) at dynahash.c:1441
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x00002ba3b40ec728 in invalidate_psin_entries_using_relid (relid=relid@entry=26560221) at src/relation_info.c:251
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x00002ba3b40ecb3d in forget_status_of_relation (relid=relid@entry=26560221) at src/relation_info.c:232
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x00002ba3b40fcc96 in pathman_relcache_hook (arg=&amp;lt;optimized out&amp;gt;, relid=26560221) at src/hooks.c:934
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x000000000087168a in LocalExecuteInvalidationMessage (msg=0x39f1f68) at inval.c:595
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Analysis
 &lt;div id="analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The changing relid showed that the walsender was still running, not dead. The LSN was not advancing, so we analyzed the LSN position to see what the transaction was doing.&lt;/p&gt;
&lt;p&gt;If the slot information was still available, we could look up the restart LSN via the slot view to find the WAL position. If not, we could use the LSN from the stack trace to identify the WAL log.&lt;/p&gt;
&lt;p&gt;Using &lt;code&gt;pg_waldump&lt;/code&gt; to inspect WAL log entries, filtering by xid:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 961/ 961, tx: 4285897514, lsn: 170F5/7DFE3470, prev 170F5/7DFE3430, desc: UPDATE+INIT off &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; xmax &lt;span style="color:#ae81ff"&gt;4285897514&lt;/span&gt; flags 0x00 ; new off &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; xmax 0, blkref &lt;span style="color:#75715e"&gt;#0: rel 1663/17662/1259 blk 8443, blkref #1: rel 1663/17662/1259 blk 7327&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Transaction len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 1778325/1778325, tx: 4285897514, lsn: 170F5/7E1F4268, prev 170F5/7E1F4220, desc: COMMIT 2025-05-01 09:24:57.828299 CST; inval msgs: catcache &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813261&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813255&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;51030741&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813252&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;50737247&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813246&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813243&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813237&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;50737241&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813234&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813224&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;49379811&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813216&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813210&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;45452775&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The transaction for &lt;code&gt;rel 1663/17662/1259&lt;/code&gt; had 180,000 records. The last record was inval msgs: ~70,000 catcache entries and ~30,000 relcache entries.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;rel 1663/17662/1259&lt;/code&gt; is &lt;code&gt;pg_class&lt;/code&gt;. Querying by xmin reveals the affected tables and commit time:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; xmin,xmax,pg_xact_commit_timestamp(xmin),relname &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; xmin&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;4285897514&amp;#39;&lt;/span&gt;::xid &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;desc&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_xact_commit_timestamp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+------+-------------------------------+---------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4285897514&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;828299&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; v$session
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4285897514&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;828299&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tmp_20230801_id_seq
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4285897514&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;828299&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tmp_20230801
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4285897514&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;828299&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; test_param
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4285897514&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;828299&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; test_20240105
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; xmin&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;4285897514&amp;#39;&lt;/span&gt;::xid ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;18523&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;139138&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Checking the pglog by timestamp:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2025-05-01 09:24:59.837 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;postgres&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,61418,&lt;span style="color:#e6db74"&gt;&amp;#34;[local]&amp;#34;&lt;/span&gt;,6812cd65.efea,3,&lt;span style="color:#e6db74"&gt;&amp;#34;DO&amp;#34;&lt;/span&gt;,2025-05-01 09:24:53 CST,549/0,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;duration: 6036.275 ms statement: 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; EXECUTE &amp;#39;GRANT SELECT ON ALL TABLES IN SCHEMA public TO r_lzldbdata_qry&amp;#39;;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; END;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; &lt;/span&gt;$$&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;psql&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;We can basically confirm that the GRANT operation was the culprit. GRANT updates &lt;code&gt;relacl&lt;/code&gt; in &lt;code&gt;pg_class&lt;/code&gt;, and at least 18,000 relations had their permissions updated. Updates to &lt;code&gt;pg_class&lt;/code&gt; trigger invalidation messages, and the massive number of invalidation messages were being processed slowly in the walsender process.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Reproduction
 &lt;div id="reproduction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reproduction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create a logical replication slot, any kind will do
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_create_logical_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_test&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;test_decoding&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_recvlogical &lt;span style="color:#f92672"&gt;-&lt;/span&gt;h &lt;span style="color:#ae81ff"&gt;127&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;p &lt;span style="color:#ae81ff"&gt;7997&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;d lzldb &lt;span style="color:#f92672"&gt;-&lt;/span&gt;U repuser &lt;span style="color:#75715e"&gt;--slot=logical_test --start -f recv.sql &amp;amp;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create many tables
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DO&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; i &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;20000&lt;/span&gt; LOOP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; format(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;CREATE TABLE IF NOT EXISTS table_%s ( 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; col1 varchar(10)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; )&amp;#39;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lpad(i::text, &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;) &lt;span style="color:#75715e"&gt;-- Generate 5-digit numbered table names
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; );
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;END&lt;/span&gt; LOOP;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;END&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Single GRANT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;all&lt;/span&gt; tables &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;schema&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; r_lzldb_qry;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Perfectly reproduced
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzlhost:&lt;span style="color:#f92672"&gt;~/&lt;/span&gt;lzl&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pstack &lt;span style="color:#ae81ff"&gt;172862&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; hash_seq_search (status&lt;span style="color:#f92672"&gt;=&lt;/span&gt;status&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x7ffd664be280) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; dynahash.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1444&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00002ad31235e728 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; invalidate_psin_entries_using_relid (relid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;relid&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1002857&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; src&lt;span style="color:#f92672"&gt;/&lt;/span&gt;relation_info.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;251&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00002ad31235eb3d &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; forget_status_of_relation (relid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;relid&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1002857&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; src&lt;span style="color:#f92672"&gt;/&lt;/span&gt;relation_info.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;232&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00002ad31236ec96 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; pathman_relcache_hook (arg&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, relid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1002857&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; src&lt;span style="color:#f92672"&gt;/&lt;/span&gt;hooks.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;934&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000087168a &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; LocalExecuteInvalidationMessage (msg&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2ad3c3f61a88) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; inval.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;595&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000071d50e &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ReorderBufferExecuteInvalidations (rb&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x17e5698, txn&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x180d698, txn&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x180d698) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; reorderbuffer.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2238&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzlhost:&lt;span style="color:#f92672"&gt;~/&lt;/span&gt;lzl&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pstack &lt;span style="color:#ae81ff"&gt;172862&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000000000891d0c &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; hash_seq_search (status&lt;span style="color:#f92672"&gt;=&lt;/span&gt;status&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x7ffd664be280) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; dynahash.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1441&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00002ad31235e728 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; invalidate_psin_entries_using_relid (relid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;relid&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1011110&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; src&lt;span style="color:#f92672"&gt;/&lt;/span&gt;relation_info.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;251&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00002ad31235eb3d &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; forget_status_of_relation (relid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;relid&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1011110&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; src&lt;span style="color:#f92672"&gt;/&lt;/span&gt;relation_info.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;232&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00002ad31236ec96 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; pathman_relcache_hook (arg&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, relid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1011110&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; src&lt;span style="color:#f92672"&gt;/&lt;/span&gt;hooks.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;934&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- relid keeps changing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- CPU pegged at 100%:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ps &lt;span style="color:#f92672"&gt;-&lt;/span&gt;eo pid,&lt;span style="color:#f92672"&gt;%&lt;/span&gt;cpu,&lt;span style="color:#f92672"&gt;%&lt;/span&gt;mem&lt;span style="color:#f92672"&gt;|&lt;/span&gt;grep &lt;span style="color:#ae81ff"&gt;172862&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;172862&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;99&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Takes about 2 hours to catch up&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Accelerating Walsender by Removing Pathman
 &lt;div id="accelerating-walsender-by-removing-pathman" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#accelerating-walsender-by-removing-pathman" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Since the database wasn&amp;rsquo;t actually using pathman partitioned tables but had the extension installed, we tried bypassing the pathman hook to speed up walsender processing.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;drop&lt;/span&gt; extension pg_pathman;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;all&lt;/span&gt; tables &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;schema&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; r_lzldb_upd;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzlhost&lt;span style="color:#f92672"&gt;~/&lt;/span&gt;lzl&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pstack &lt;span style="color:#ae81ff"&gt;133460&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; hash_seq_search (status&lt;span style="color:#f92672"&gt;=&lt;/span&gt;status&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x7ffe292d5c90) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; dynahash.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1418&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000087f228 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; RelfilenodeMapInvalidateCallback (arg&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, relid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1034036&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; relfilenodemap.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000087168a &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; LocalExecuteInvalidationMessage (msg&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2b9699795768) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; inval.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;595&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000071d50e &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ReorderBufferExecuteInvalidations (rb&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x195a358, txn&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1a6ff38, txn&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1a6ff38) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; reorderbuffer.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2238&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; ReorderBufferCommit (rb&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x195a358, xid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;xid&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;328684387&lt;/span&gt;, commit_lsn&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8016890875224&lt;/span&gt;, end_lsn&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, commit_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;commit_time&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;799851538975691&lt;/span&gt;, origin_id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;origin_id&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, origin_lsn&lt;span style="color:#f92672"&gt;=&lt;/span&gt;origin_lsn&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; reorderbuffer.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1819&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; Completed within &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; seconds&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Even without commenting out &lt;code&gt;pg_pathman&lt;/code&gt; from &lt;code&gt;shared_preload_libraries&lt;/code&gt;, there was a dramatic improvement — walsender went from 2 hours to 20 seconds.&lt;/p&gt;
&lt;p&gt;This seemed odd at first — without commenting &lt;code&gt;shared_preload_libraries&lt;/code&gt;, the hook should still run. Source analysis revealed the reason: the very first step of the hook checks for the pathman config table; if it doesn&amp;rsquo;t exist, it skips pathman&amp;rsquo;s invalidation logic entirely, so execution completes quickly:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Invalidate PartRelationInfo cache entry if needed.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;pathman_relcache_hook&lt;/span&gt;(Datum arg, Oid relid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid pathman_config_relid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* See cook_partitioning_expression() */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;pathman_hooks_enabled)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;IsPathmanReady&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Invalidation event for PATHMAN_CONFIG table (probably DROP EXTENSION).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Digging catalogs here is expensive and probably illegal, so we take
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * cached relid. It is possible that we don&amp;#39;t know it atm (e.g. pathman
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * was disabled). However, in this case caches must have been cleaned
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * on disable, and there is no DROP-specific additional actions.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	pathman_config_relid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;get_pathman_config_relid&lt;/span&gt;(true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; pathman_config_relid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;delay_pathman_shutdown&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Invalidation event for some user table */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relid &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; FirstNormalObjectId)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Invalidate PartBoundInfo entry if needed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;forget_bounds_of_rel&lt;/span&gt;(relid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Invalidate PartStatusInfo entry if needed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;forget_status_of_relation&lt;/span&gt;(relid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Invalidate PartParentInfo entry if needed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;forget_parent_of_partition&lt;/span&gt;(relid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;get_pathman_config_relid&lt;/code&gt; fetches the pathman_config table. &lt;code&gt;drop extension pg_pathman&lt;/code&gt; removes the pathman_config table from the database, so the source code never enters the &lt;code&gt;forget_*&lt;/code&gt; logic.&lt;/p&gt;
&lt;p&gt;There are other ways to accelerate walsender processing: setting &lt;code&gt;pg_pathman.enable=off&lt;/code&gt; causes &lt;code&gt;IsPathmanReady()&lt;/code&gt; to return false and bail out immediately. Or, most directly, comment out &lt;code&gt;pg_pathman&lt;/code&gt; from &lt;code&gt;shared_preload_libraries&lt;/code&gt; and restart the instance (this is instance-level, not database-level).&lt;/p&gt;

&lt;h2 class="relative group"&gt;Improvements in PG14
 &lt;div id="improvements-in-pg14" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#improvements-in-pg14" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PG14.0 release notes:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Allow logical decoding to more efficiently process cache invalidation messages (Dilip Kumar)
This allows logical decoding to work efficiently in presence of a large amount of DDL.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/release/14.0/" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/release/14.0/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Patch:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d7eb52d71" target="_blank" rel="noreferrer"&gt;https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d7eb52d71&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Comment from PG14&amp;rsquo;s &lt;code&gt;ReorderBufferAddInvalidations&lt;/code&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;We require to record it in form of the change so that we can execute only the required invalidations instead of executing all the invalidations on each CommandId increment.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Comparing PG14 vs PG13, &lt;code&gt;ReorderBufferCommit&lt;/code&gt; underwent a major rewrite.&lt;/p&gt;
&lt;p&gt;In PG13, transaction processing logic was directly in the &lt;code&gt;ReorderBufferCommit&lt;/code&gt; function:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(change&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.command_id &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; InvalidCommandId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (command_id &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; change&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.command_id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						command_id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; change&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.command_id;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;snapshot_now&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;copied)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							&lt;span style="color:#75715e"&gt;/* we don&amp;#39;t use the global one anymore */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							snapshot_now &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ReorderBufferCopySnap&lt;/span&gt;(rb, snapshot_now,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;																txn, command_id);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						snapshot_now&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;curcid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; command_id;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#a6e22e"&gt;TeardownHistoricSnapshot&lt;/span&gt;(false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#a6e22e"&gt;SetupHistoricSnapshot&lt;/span&gt;(snapshot_now, txn&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;tuplecid_hash);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;						 * Every time the CommandId is incremented, we could
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;						 * see new catalog contents, so execute all
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;						 * invalidations.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;						 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#a6e22e"&gt;ReorderBufferExecuteInvalidations&lt;/span&gt;(rb, txn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In PG14, the main logic moved to &lt;code&gt;ReorderBufferReplay&lt;/code&gt; -&amp;gt; &lt;code&gt;ReorderBufferProcessTXN&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ReorderBufferProcessTXN&lt;/code&gt; introduced a new &lt;code&gt;case REORDER_BUFFER_CHANGE_INVALIDATION&lt;/code&gt; branch to execute invalidations from the reorder buffer:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; REORDER_BUFFER_CHANGE_INVALIDATION:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#75715e"&gt;/* Execute the invalidation messages locally */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;ReorderBufferExecuteInvalidations&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;												 change&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.inval.ninvalidations,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;												 change&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.inval.invalidations);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The logic after &lt;code&gt;ReorderBufferExecuteInvalidations&lt;/code&gt; is largely the same. The main differences between PG13 and PG14&amp;rsquo;s &lt;code&gt;ReorderBufferCommit&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ReorderBufferCommit&lt;/code&gt; is no longer the primary transaction processing function; the call stack is deeper&lt;/li&gt;
&lt;li&gt;A new &lt;code&gt;case REORDER_BUFFER_CHANGE_INVALIDATION&lt;/code&gt; branch was added, separated from &lt;code&gt;REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID&lt;/code&gt;, to handle invalidations independently&lt;/li&gt;
&lt;li&gt;The per-command_id invalidation processing logic was removed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Root Cause and Solutions
 &lt;div id="root-cause-and-solutions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#root-cause-and-solutions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The root cause of the walsender hang was a bulk GRANT operation that updated many rows in &lt;code&gt;pg_class&lt;/code&gt;, triggering a massive number of invalidation messages. A statement like &lt;code&gt;GRANT privs ON ALL TABLES IN SCHEMA public TO role1&lt;/code&gt; executes as multiple commands within a single transaction in PostgreSQL. In PG13, logical replication processes invalidation messages per-command, invoking each hook&amp;rsquo;s inval hash table processing. In this scenario, pathman&amp;rsquo;s hook was particularly slow at processing the inval hash table, causing replication lag.&lt;/p&gt;
&lt;p&gt;Conditions for pathman-induced slowness (all must apply):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PG13 or earlier&lt;/li&gt;
&lt;li&gt;Bulk GRANT&lt;/li&gt;
&lt;li&gt;pathman extension installed (whether used or not)&lt;/li&gt;
&lt;li&gt;Logical replication slot active&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Even after removing pathman, significant CPU time was still spent in functions like &lt;code&gt;RelfilenodeMapInvalidateCallback&lt;/code&gt;. In PG13 testing, the processing time difference between with and without pathman was hours vs. minutes.&lt;/p&gt;
&lt;p&gt;Other untested but community-mentioned scenarios (all must apply):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PG13 or earlier&lt;/li&gt;
&lt;li&gt;Bulk DDL / TRUNCATE / DCL / DROP PUBLICATION&lt;/li&gt;
&lt;li&gt;Logical replication slot active&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Short-term fix: If pathman tables are not in use, drop the extension or unload the pathman shared library; restart the replication slot.&lt;/p&gt;
&lt;p&gt;Long-term fix: Upgrade to PG14+ (tested — extremely fast with no lag).&lt;/p&gt;

&lt;h3 class="relative group"&gt;
 &lt;div id="" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/message-id/flat/17716-1fe42e7b44fc2f25%40postgresql.org" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/message-id/flat/17716-1fe42e7b44fc2f25%40postgresql.org&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d7eb52d71" target="_blank" rel="noreferrer"&gt;https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d7eb52d71&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>PostgreSQL Ops Experience 2024</title><link>https://lastdba.com/en/2025/01/08/postgresql-ops-experience-2024/</link><pubDate>Wed, 08 Jan 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/01/08/postgresql-ops-experience-2024/</guid><description>&lt;p&gt;This article focuses on common PostgreSQL operations issues — rare edge cases that surface once every two or three years are out of scope.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s primarily a technical ops summary, aiming for clarity and quick applicability. Deep dives at the source-code level are deliberately avoided.&lt;/p&gt;

&lt;h2 class="relative group"&gt;SQL Performance &amp;amp; Execution Plans
 &lt;div id="sql-performance--execution-plans" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sql-performance--execution-plans" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Sudden Execution Plan Changes
 &lt;div id="sudden-execution-plan-changes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sudden-execution-plan-changes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL does not support optimizer hints natively, and the community has made it clear it never will.
The PG community&amp;rsquo;s stance is roughly: &amp;ldquo;Our optimizer is perfect. If the current plan isn&amp;rsquo;t good enough, it&amp;rsquo;s because the developer doesn&amp;rsquo;t understand optimization.&amp;rdquo;&lt;/p&gt;</description><content:encoded>&lt;p&gt;This article focuses on common PostgreSQL operations issues — rare edge cases that surface once every two or three years are out of scope.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s primarily a technical ops summary, aiming for clarity and quick applicability. Deep dives at the source-code level are deliberately avoided.&lt;/p&gt;

&lt;h2 class="relative group"&gt;SQL Performance &amp;amp; Execution Plans
 &lt;div id="sql-performance--execution-plans" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sql-performance--execution-plans" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Sudden Execution Plan Changes
 &lt;div id="sudden-execution-plan-changes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sudden-execution-plan-changes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL does not support optimizer hints natively, and the community has made it clear it never will.
The PG community&amp;rsquo;s stance is roughly: &amp;ldquo;Our optimizer is perfect. If the current plan isn&amp;rsquo;t good enough, it&amp;rsquo;s because the developer doesn&amp;rsquo;t understand optimization.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Regardless of what the PG community thinks, sudden execution plan regressions happen all the time in production, and we don&amp;rsquo;t have the rich, native plan-binding mechanisms that Oracle provides. This is a real challenge for production operations. For example: one morning, a sensitive query suddenly changes its plan, runtime jumps from 0.1s to 1s, and due to some concurrency the database CPU gets hammered — the business notices immediately. Without plan-binding tools, our only two rapid recovery options are: 1) collect statistics, or 2) scale up CPU.&lt;/p&gt;
&lt;p&gt;A question about rapid recovery: does collecting statistics always help? A good DBA can identify where the optimizer went wrong, but can&amp;rsquo;t instantly conjure up a complete correct plan — especially for complex queries. Collecting statistics essentially hands the optimization problem back to the optimizer, trusting it to get it right. While this sounds a bit shaky, in PostgreSQL it actually works most of the time. (For scenarios where collecting stats is known to be useless, see the &amp;ldquo;ORDER BY LIMIT Problem&amp;rdquo; section.)&lt;/p&gt;
&lt;p&gt;Why do execution plans suddenly change and regress?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Plans are cost-based, costs rely on statistics, and statistics are always lagging&lt;/li&gt;
&lt;li&gt;Sufficiently complex SQL has a huge number of possible execution paths, and the optimizer picks the lowest-cost one&lt;/li&gt;
&lt;li&gt;PG exposes many optimizer parameters to tune for local hardware (e.g., &lt;code&gt;seq_page_cost&lt;/code&gt;, &lt;code&gt;effective_cache_size&lt;/code&gt;). These can nudge the optimizer&amp;rsquo;s preferences but are very low-level. While there&amp;rsquo;s theoretical tuning headroom, changing them has system-wide effects. After go-live, adjusting these is extremely high-risk. The very existence of these parameters hints that no plan can be 100% perfect, because the optimizer&amp;rsquo;s reasoning depends on its environment&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Even mighty Oracle, with its arsenal of plan-stabilization features, can&amp;rsquo;t guarantee 100% problem-free SQL — because SQL, data, statistics, bind variables, etc. are all dynamic.&lt;/p&gt;
&lt;p&gt;For PG users, we&amp;rsquo;re not there yet, but we can work on making plans more stable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Don&amp;rsquo;t join too many tables. More tables mean more possible plans — to the point where &lt;a href="https://www.postgresql.org/docs/16/geqo-pg-intro.html" target="_blank" rel="noreferrer"&gt;PG GEQO&lt;/a&gt; stops enumerating all plans, reducing the chance of finding the optimal one&lt;/li&gt;
&lt;li&gt;Don&amp;rsquo;t write overly complex SQL. Keep in mind SQL may come from ORM frameworks rather than hand-written queries. Framework-generated SQL is often optimized for a goal with little regard for brevity or readability, making it very hard to tune&lt;/li&gt;
&lt;li&gt;Don&amp;rsquo;t create indexes indiscriminately — have a clear goal. Random indexes confuse the optimizer&lt;/li&gt;
&lt;li&gt;Tune per-table statistics collection thresholds via &lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt; (see &amp;ldquo;Delayed Statistics Collection&amp;rdquo;)&lt;/li&gt;
&lt;li&gt;Use pg_hint_plan to give the optimizer hints&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;pg_hint_plan
 &lt;div id="pg_hint_plan" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_hint_plan" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://github.com/ossc-db/pg_hint_plan" target="_blank" rel="noreferrer"&gt;pg_hint_plan&lt;/a&gt; is a third-party extension that uses hints to guide the optimizer toward the correct plan.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What pg_hint_plan supports:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Specifying scan methods (e.g., index scan), join methods (NL/HASH/MERGE), join order, memoize, estimated row counts, parallelism, and GUC parameters&lt;/li&gt;
&lt;li&gt;Binding hints to SQL via &lt;code&gt;hint_plan.hints&lt;/code&gt; without modifying the application SQL text&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;pg_hint_plan limitations:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Usage restrictions with subqueries, foreign tables, CTEs, views, PL/pgSQL, etc.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;compute_query_id&lt;/code&gt; treats hints as comments and ignores them&lt;/li&gt;
&lt;li&gt;Unknown bugs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While this extension is actively maintained, I haven&amp;rsquo;t found large-scale production deployment cases yet. We&amp;rsquo;ve also encountered issues in limited production use where hints don&amp;rsquo;t take effect — possibly related to JDBC plan caching — but it&amp;rsquo;s hard to draw firm conclusions.&lt;/p&gt;
&lt;p&gt;In short: pg_hint_plan is a good tool, but large-scale production deployment is still TBD. I recommend waiting and watching. You can trial it, but don&amp;rsquo;t become dependent on it.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Delayed Statistics Collection
 &lt;div id="delayed-statistics-collection" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#delayed-statistics-collection" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Statistics are the foundation of SQL optimization. PG statistics aren&amp;rsquo;t particularly complex, but many people still don&amp;rsquo;t fully understand them.&lt;/p&gt;
&lt;p&gt;The three key views for PG statistics: &lt;code&gt;pg_class&lt;/code&gt;, &lt;code&gt;pg_stat_all_tables&lt;/code&gt;, &lt;code&gt;pg_stats&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- pg_class: pages and tuples
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,relpages,reltuples::bigint &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpg&amp;#39;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relpages &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;187501&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;reltuples &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6000032&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- pg_stat_all_tables: live tuples, dead tuples, last analyze time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,n_live_tup,n_dead_tup,last_analyze,last_autoanalyze &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_all_tables &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpg&amp;#39;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;----+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_live_tup &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6000032&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_dead_tup &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;last_analyze &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;54&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;553057&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;last_autoanalyze &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- pg_stats: per-column statistics — understand every field
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stats &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; tablename&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpg&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; attname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;schemaname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tablename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;attname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;inherited &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;null_frac &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;avg_width &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;70&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_distinct &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_vals &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_freqs &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;histogram_bounds &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;correlation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_elems &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_elem_freqs &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;elem_count_histogram &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Stale statistics are very likely to cause execution plan changes and SQL performance issues.
Check &lt;code&gt;last_autovacuum&lt;/code&gt; and &lt;code&gt;last_autoanalyze&lt;/code&gt; in &lt;code&gt;pg_stat_all_tables&lt;/code&gt; to determine if collection is lagging.&lt;/p&gt;
&lt;p&gt;Why tune it? Because the default &lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt; is 0.1, meaning statistics are only collected when data changes exceed 10%. For a 1-billion-row table, that&amp;rsquo;s 100 million rows — possibly far too infrequent.&lt;/p&gt;
&lt;p&gt;Evaluate whether to tune per-table &lt;code&gt;autovacuum_vacuum_scale_factor&lt;/code&gt; and &lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt; based on: whether it&amp;rsquo;s a core business table, number of joins, query complexity, access frequency, month-boundary issues, data skew, etc. The goal: increase collection frequency to reduce plan-regression risk without wasting resources on excessive vacuuming.&lt;/p&gt;
&lt;p&gt;What value should you set? An example:&lt;/p&gt;
&lt;p&gt;For a monthly table (or monthly partition) with queries hitting the current day&amp;rsquo;s data: with &lt;code&gt;autovacuum_analyze_scale_factor = 0.1&lt;/code&gt;, the table gets analyzed almost daily for the first ~10 days, but may skip analysis around day 12. At that point statistics can cross a boundary and plans may degrade. To ensure analysis continues through days 10–31 of the month, set &lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt; below &lt;code&gt;0.03&lt;/code&gt;. I recommend &lt;code&gt;autovacuum_analyze_scale_factor = 0.02&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Parameter tuning reference (consider your table&amp;rsquo;s data model!):&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;th&gt;Recommended&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;autovacuum_vacuum_scale_factor&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;0.2&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;0.04&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;0.1&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;0.02&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 class="relative group"&gt;The Optimizer May Choose a Non-Primary-Key Index
 &lt;div id="the-optimizer-may-choose-a-non-primary-key-index" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-optimizer-may-choose-a-non-primary-key-index" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Intuitively, a primary key should have the best selectivity, but the optimizer may still choose something else.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Reproduction commands
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; t1(a char(&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,b char(&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; md5(&lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;::text),md5(&lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxa &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t1(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxb &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t1(b);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; t1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; b&lt;span style="color:#f92672"&gt;||&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Columns a and b have identical selectivity, but the optimizer picks the regular index, not the PK
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxb &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2008&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;045&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;046&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (b &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;::bpchar)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;::bpchar)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Force the PK path — cost is only marginally higher
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; b&lt;span style="color:#f92672"&gt;||&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxa &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2008&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;079&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;079&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;::bpchar)&lt;span style="color:#f92672"&gt;`&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((b)::text &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared &lt;span style="color:#66d9ef"&gt;read&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Even though columns a and b have the same type and selectivity, the optimizer picks the regular index over the PK. The PK path costs 0.01 more.&lt;/p&gt;
&lt;p&gt;Why does this matter?&lt;/p&gt;
&lt;p&gt;With the current data distribution, picking the regular index is harmless. But once data changes, the two index plans can diverge dramatically:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; (autovacuum_enabled &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;off&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; md5(&lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;::text),&lt;span style="color:#e6db74"&gt;&amp;#39;repeat&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;20001&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;30000&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- b=&amp;#39;repeat&amp;#39; has terrible selectivity, but the b index is still chosen
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;repeat&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxb &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2008&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;823&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;824&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (b &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;repeat&amp;#39;&lt;/span&gt;::bpchar)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;::bpchar)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; Removed &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; Filter: &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2511&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Compare with the PK plan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; b&lt;span style="color:#f92672"&gt;||&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;repeat&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxa &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2008&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;::bpchar)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((b)::text &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;repeat&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Even with poor real selectivity, the optimizer sticks with the regular index — but efficiency is far worse (shared hit=2511 vs. shared hit=3). For latency-sensitive queries or larger data volumes, this becomes a real production problem.&lt;/p&gt;
&lt;p&gt;Solutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Manually collect statistics; increase collection frequency&lt;/li&gt;
&lt;li&gt;Use pg_hint_plan&lt;/li&gt;
&lt;li&gt;Rewrite the SQL to prevent it from using the regular index&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;The ORDER BY LIMIT Problem
 &lt;div id="the-order-by-limit-problem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-order-by-limit-problem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;ORDER BY with LIMIT is a well-known issue with plenty of write-ups and case studies online (see my post &lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E6%A1%88%E4%BE%8B/ORDER%20BY%20limit%2010%E6%AF%94ORDER%20BY%20limit%20100%E6%9B%B4%E6%85%A2.md" target="_blank" rel="noreferrer"&gt;ORDER BY LIMIT 10 Is Slower Than ORDER BY LIMIT 100&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The root cause: the optimizer currently can&amp;rsquo;t estimate where data sits in the table relative to the index order. If matching rows happen to be near the end of the table, the scan reads far more data than expected before returning the LIMIT rows. Note this isn&amp;rsquo;t limited to ORDER BY + LIMIT — any operation involving sorted output + LIMIT can hit it: GROUP BY + LIMIT, DISTINCT + LIMIT, merge joins, etc.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Solutions:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rewrite the SQL: add an expression to prevent using the sort-column index (including PK), e.g., &lt;code&gt;order by ''||col1 limit xxx&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Create a composite index: a composite index on (sort_column + index_column) may be chosen by the optimizer and is generally more efficient than an index on the sort column alone. This approach doesn&amp;rsquo;t require changing the SQL&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Table Bloat
 &lt;div id="table-bloat" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#table-bloat" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Something Blocking Dead Tuple Cleanup
 &lt;div id="something-blocking-dead-tuple-cleanup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#something-blocking-dead-tuple-cleanup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Putting aside autovacuum configuration issues and edge cases, the common blockers are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Long-running transactions. Note: a long transaction on a &lt;em&gt;different&lt;/em&gt; table also blocks dead-tuple reclamation. Read-only queries cause this too.&lt;/li&gt;
&lt;li&gt;Replication slots. Lagging or defunct replication slots cause this.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Both are relatively easy to solve: 1) terminate the long-transaction session, 2) drop the replication slot, or have the consumer analyze why consumption is so slow.&lt;/p&gt;

&lt;h3 class="relative group"&gt;High-Concurrency UPDATE Causing Table Bloat
 &lt;div id="high-concurrency-update-causing-table-bloat" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#high-concurrency-update-causing-table-bloat" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Unlike something blocking vacuum, this is about dead tuples being generated faster than vacuum can clean them up. Typically, such tables show high &lt;code&gt;pg_stat_all_tables.n_tup_upd&lt;/code&gt;. If table bloat requires repack, assess whether write volume is high enough to make repeated manual repack a losing game. In that case, tune the table/index &lt;code&gt;fillfactor&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For the underlying principles, see this post &lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E6%BA%90%E7%A0%81%E8%A7%A3%E6%9E%90/%E4%BB%8E%E5%BE%88%E6%85%A2%E7%9A%84%E5%94%AF%E4%B8%80%E7%B4%A2%E5%BC%95%E6%89%AB%E6%8F%8F%E5%88%B0%E7%B4%A2%E5%BC%95%E8%86%A8%E8%83%80.md" target="_blank" rel="noreferrer"&gt;From Painfully Slow Unique Index Scans to Index Bloat&lt;/a&gt;. I&amp;rsquo;ll summarize the conclusions here:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;fillfactor basics:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;fillfactor acts as a high-water mark for tables or indexes. During INSERT, once a page reaches its fillfactor line, new rows go to the next page. The purpose is to reserve space for UPDATEs so they don&amp;rsquo;t constantly seek new pages.&lt;/p&gt;
&lt;p&gt;While both tables and indexes have fillfactor with the same goal (accommodating UPDATEs), the details differ significantly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tables: If a page still has free space, an UPDATE can stay within the same page — no new page needed, no need to find another page with space. More importantly, thanks to PG&amp;rsquo;s HOT (Heap-Only Tuple) feature, in-page updates don&amp;rsquo;t touch indexes, naturally slowing index bloat&lt;/li&gt;
&lt;li&gt;Indexes: Different rows or out-of-page updates of the same row generate new index entries. Reserving space in index pages via fillfactor greatly reduces index page splits&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, fillfactor settings are tightly coupled with the workload. If data is append-only like logs with zero updates, fillfactor=100 for both tables and indexes is perfectly fine. But most business tables see updates, so fillfactor shouldn&amp;rsquo;t be 100. With frequent UPDATEs, it should be even lower.&lt;/p&gt;
&lt;p&gt;Yet PG&amp;rsquo;s defaults are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Table default: fillfactor=100&lt;/li&gt;
&lt;li&gt;Index default: fillfactor=90&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Recommended settings:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpg &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; (fillfactor&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; lzlpg_pkey &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; (fillfactor&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;70&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- These commands only affect new pages; existing pages need repack
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Repack:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;. &lt;span style="color:#66d9ef"&gt;Check&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; long transactions; resolve them &lt;span style="color:#66d9ef"&gt;first&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;. nohup pg_repack &lt;span style="color:#f92672"&gt;-&lt;/span&gt;d lzldb &lt;span style="color:#75715e"&gt;--table lzlpg -p 6666 -no-kill-backend &amp;gt; pgrepack_lzlpg_log.log 2&amp;gt;&amp;amp;1 &amp;amp;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Long Transaction Problems
 &lt;div id="long-transaction-problems" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#long-transaction-problems" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Long transactions don&amp;rsquo;t have a huge amount of theory behind them — monitor and handle promptly — but they absolutely deserve their own section.&lt;/p&gt;
&lt;p&gt;Long transactions cause many problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Unreleased locks → application blocking&lt;/li&gt;
&lt;li&gt;WAL not recycled → disk alerts&lt;/li&gt;
&lt;li&gt;Dead tuples not cleaned → SQL performance degradation&lt;/li&gt;
&lt;li&gt;Various other bizarre performance issues linked to long transactions&lt;/li&gt;
&lt;li&gt;&amp;hellip;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Long transactions in PostgreSQL are far more damaging than in Oracle or MySQL. They must be strictly managed.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Subtransaction Problems
 &lt;div id="subtransaction-problems" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#subtransaction-problems" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;blockquote&gt;&lt;p&gt;&amp;ldquo;Subtransactions are basically cursed. Rip em out.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Subtransactions cause many problems and are a frequent pain point in the industry.&lt;/p&gt;
&lt;p&gt;Industry experience reports:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://pganalyze.com/blog/5mins-postgres-17-configurable-slru-cache" target="_blank" rel="noreferrer"&gt;Waiting for Postgres 17: Configurable SLRU cache sizes for increased performance&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://knowledge.enterprisedb.com/hc/en-us/articles/13523268146972-Subtransactions-overflow-and-the-performance-cliff" target="_blank" rel="noreferrer"&gt;Subtransactions-overflow-and-the-performance-cliff&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/" target="_blank" rel="noreferrer"&gt;Why we spent the last month eliminating PostgreSQL subtransactions&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Where subtransactions come from:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;PL/pgSQL&lt;/code&gt; functions containing a block with an &lt;strong&gt;exception&lt;/strong&gt; clause&lt;/li&gt;
&lt;li&gt;&lt;code&gt;savepoints&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;JDBC + &lt;a href="https://jdbc.postgresql.org/documentation/use/" target="_blank" rel="noreferrer"&gt;autosave=always&lt;/a&gt; (default &lt;code&gt;autosave=never&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;ODBC&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note: OGG uses an ODBC driver, and ODBC cannot disable subtransactions.&lt;/p&gt;
&lt;p&gt;GaussDB&amp;rsquo;s ODBC can disable subtransactions via &lt;a href="https://support.huaweicloud.com/intl/en-us/centralized-devg-v8-gaussdb/gaussdb-42-0098.html" target="_blank" rel="noreferrer"&gt;ForExtensionConnector&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;So we can advise applications to keep subtransactions under 64, but we can&amp;rsquo;t easily advise against using OGG, since migrating off Oracle often depends on OGG-based data sync tools.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Subtransaction problem scenarios and symptoms:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1(+) long transaction + subtransaction overflow + high concurrency → severe performance drop&lt;/li&gt;
&lt;li&gt;Subtransaction overflow (64+) → noticeable performance dip&lt;/li&gt;
&lt;li&gt;Subtransaction overflow (64+) + multixact → severe performance drop&lt;/li&gt;
&lt;li&gt;1(+) long transaction + 1(+) subtransaction → severe query performance drop on read replicas&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Improvements in PG17:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;SLRU manages transaction relationships for clog, multixact, subtrans, etc. in shared memory. Relevant source definitions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Number of SLRU buffers to use for subtrans */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define NUM_SUBTRANS_BUFFERS	32 &lt;/span&gt;&lt;span style="color:#75715e"&gt;// 32 SLRU pages in shared memory
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * for non-aborted subtransactions of its current top transaction. These
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * have to be treated as running XIDs by other backends.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * We also keep track of whether the cache overflowed (ie, the transaction has
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * generated at least one subtransaction that didn&amp;#39;t fit in the cache).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * If none of the caches have overflowed, we can assume that an XID that&amp;#39;s not
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * listed anywhere in the PGPROC array is not a running transaction. Else we
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * have to look at pg_subtrans.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define PGPROC_MAX_CACHED_SUBXIDS 64	&lt;/span&gt;&lt;span style="color:#75715e"&gt;// Overflow at 64+, per backend
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;PG17 SLRU improvements:
New GUC parameter to configure SLRU slot count; split the existing single centralized SLRU lock into multiple bank locks.&lt;/p&gt;
&lt;p&gt;Improvement effect:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d712858437e4.png" alt="image.png" /&gt;
(&lt;a href="https://www.pgevents.ca/events/pgconfdev2024/sessions/session/53/slides/27/SLRU%20Performance%20Issues.pdf" target="_blank" rel="noreferrer"&gt;https://www.pgevents.ca/events/pgconfdev2024/sessions/session/53/slides/27/SLRU%20Performance%20Issues.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Subtransaction handling strategies:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dev standards: Don&amp;rsquo;t use &lt;code&gt;savepoints&lt;/code&gt;; consider &lt;code&gt;ON CONFLICT&lt;/code&gt; for write conflicts&lt;/li&gt;
&lt;li&gt;Dev standards: Don&amp;rsquo;t use &lt;code&gt;exception&lt;/code&gt; blocks&lt;/li&gt;
&lt;li&gt;Dev standards: Ensure JDBC does &lt;em&gt;not&lt;/em&gt; have &lt;code&gt;autosave=always&lt;/code&gt; enabled&lt;/li&gt;
&lt;li&gt;Monitoring: Targeted monitoring of &lt;code&gt;pg_stat_slru&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Monitoring: Targeted monitoring of &lt;code&gt;SAVEPOINT&lt;/code&gt; and &lt;code&gt;EXCEPTION&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;CDC standards: Use ODBC (and OGG or other ODBC-based tools) with care; split transactions, cap subtransactions per large transaction at 50K&lt;/li&gt;
&lt;li&gt;Upgrade: Move to PG17&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Concurrency &amp;amp; Performance
 &lt;div id="concurrency--performance" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#concurrency--performance" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Snapshot and Concurrency Parameter Tuning
 &lt;div id="snapshot-and-concurrency-parameter-tuning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#snapshot-and-concurrency-parameter-tuning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter&lt;/th&gt;
 &lt;th&gt;Type&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;th&gt;Recommended&lt;/th&gt;
 &lt;th&gt;Requires Restart&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;old_snapshot_threshold&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;cpu&lt;/td&gt;
 &lt;td&gt;-1 (community)&lt;/td&gt;
 &lt;td&gt;-1&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;max_parallel_workers_per_gather&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;cpu&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;td&gt;0&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;code&gt;old_snapshot_threshold&lt;/code&gt; easily causes performance problems when enabled — there&amp;rsquo;s plenty of material online. Even though it requires a restart, I strongly recommend keeping it disabled.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;max_parallel_workers_per_gather&lt;/code&gt; auto-enables parallelism for large queries, but parallelism of 2 won&amp;rsquo;t give a proportional 2x speedup. This parameter is best used in specific scenarios, like explicitly setting parallel workers for batch jobs. Since no restart is needed, it&amp;rsquo;s a quick change.&lt;/p&gt;
&lt;p&gt;Will disabling &lt;code&gt;old_snapshot_threshold&lt;/code&gt; cause problems?&lt;/p&gt;
&lt;p&gt;No. This parameter exists to limit long transactions — which do damage performance in PG — but the parameter itself causes performance issues, defeating the purpose.&lt;/p&gt;
&lt;p&gt;Long transactions can be handled via several mechanisms:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Long transaction monitoring. This is the most important, and monitoring is fairly mature.&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;statement_timeout&lt;/code&gt; (default 0)&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;transaction_timeout&lt;/code&gt; (default 0, available in PG17+)&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;lock_timeout&lt;/code&gt; (default 0; recommended at session level for DDL)&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;idle_in_transaction_session_timeout&lt;/code&gt; (default 0; we set it to 2h)&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;idle_session_timeout&lt;/code&gt; (default 0; not relevant here)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;High-Concurrency Commits Causing LWLOCK:WALWrite
 &lt;div id="high-concurrency-commits-causing-lwlockwalwrite" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#high-concurrency-commits-causing-lwlockwalwrite" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E6%A1%88%E4%BE%8B/%E6%A1%88%E4%BE%8B-insert%20value%E5%81%B6%E5%8F%91%E6%85%A2%E5%88%86%E6%9E%90.md" target="_blank" rel="noreferrer"&gt;Case Study: Intermittent Slow INSERT &amp;hellip; VALUES&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Key takeaways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;There&amp;rsquo;s only one IO:WALWrite, but there can be dozens of LWLOCK:WALWrite waiters&lt;/li&gt;
&lt;li&gt;You can&amp;rsquo;t directly see the LWLOCK blocking chain, but from the source code we know LWLOCK:WALWrite is waiting on IO:WALWrite&lt;/li&gt;
&lt;li&gt;In high-concurrency small-transaction scenarios, increasing WAL buffer size theoretically doesn&amp;rsquo;t help much&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What problems does this cause?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Concurrent writes block, write latency increases, active sessions may spike&lt;/li&gt;
&lt;li&gt;High-concurrency small transactions can&amp;rsquo;t saturate disk IO&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Solutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Distribute concurrent writes across time&lt;/li&gt;
&lt;li&gt;Batch commits at the application level&lt;/li&gt;
&lt;li&gt;Analyze and try to reduce FPI (see FPI section)&lt;/li&gt;
&lt;li&gt;Group commit (&lt;a href="https://www.postgresql.org/docs/17/runtime-config-wal.html#GUC-COMMIT-DELAY" target="_blank" rel="noreferrer"&gt;TBD&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;WAL &amp;amp; Latency
 &lt;div id="wal--latency" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#wal--latency" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;FPI and Checkpoint Parameters
 &lt;div id="fpi-and-checkpoint-parameters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fpi-and-checkpoint-parameters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PG generates WAL FPI (Full Page Images) the first time a page is touched after a checkpoint. So more frequent checkpoints → higher probability of FPI.&lt;/p&gt;
&lt;p&gt;Checkpoint frequency is controlled by two parameters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;checkpoint_timeout&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;max_wal_size&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Principle:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0197be136174.png" alt="image.png" /&gt;
(Egor Rogov, PostgreSQL 14 Internals)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;max_wal_size&lt;/code&gt; defaults to 1GB, which is too small for high-load databases. Generally, you should increase this parameter to reduce FPI.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;checkpoint_timeout&lt;/code&gt; defaults to 5 minutes, which seems reasonable.&lt;/p&gt;

&lt;h3 class="relative group"&gt;FPI and Random Writes
 &lt;div id="fpi-and-random-writes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fpi-and-random-writes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Even with longer checkpoint intervals, FPI problems may persist. Check whether the workload involves UUID-based random writes. You may need to switch to sequences or another UUID scheme.&lt;/p&gt;
&lt;p&gt;Finding the specific index:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Check if FPI is severe&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;--stats=record&lt;/code&gt; is handy&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_waldump -z --stats&lt;span style="color:#f92672"&gt;=&lt;/span&gt;record 00000001000001860000001B&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Sort which relations have the most FPWs&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_waldump 00000001000001860000001B|grep FPW|awk -F &lt;span style="color:#e6db74"&gt;&amp;#39;:&amp;#39;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;{print $7}&amp;#39;&lt;/span&gt;|awk &lt;span style="color:#e6db74"&gt;&amp;#39;{print $2}&amp;#39;&lt;/span&gt;|sort -n|uniq -c |sort -r|head -10&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Logical Replication &amp;amp; Replication Slots
 &lt;div id="logical-replication--replication-slots" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-replication--replication-slots" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Logical replication has many issues and is a key optimization area for the community — nearly every major version brings significant improvements.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E5%86%85%E5%8A%9F%E4%BF%AE%E7%82%BC/PostgreSQL%E9%80%BB%E8%BE%91%E5%A4%8D%E5%88%B6.md" target="_blank" rel="noreferrer"&gt;Logical Replication and Replication Slots Basics&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Spill Problem
 &lt;div id="spill-problem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#spill-problem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E6%A1%88%E4%BE%8B/PG%E8%B5%B7%E5%BA%93%E9%80%BB%E8%BE%91%E5%92%8Cspill%E5%AF%BC%E8%87%B4%E8%B5%B7%E5%BA%93%E6%85%A2%E9%97%AE%E9%A2%98%E5%88%86%E6%9E%90.md" target="_blank" rel="noreferrer"&gt;Analysis of PG Startup Logic and Spill-Induced Slow Startup&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Spill key takeaways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Spill occurs when logical decoding can&amp;rsquo;t fit transaction data in memory, so it writes to disk. Spill files contain transaction information&lt;/li&gt;
&lt;li&gt;Each walsender has independent decoding, so each logical replication subscriber has its own spill&lt;/li&gt;
&lt;li&gt;Large transactions produce large spill files, typically few in number&lt;/li&gt;
&lt;li&gt;Subtransaction spill produces one spill file per subtransaction&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Versions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PG12 and earlier: hard-coded 4096 changes&lt;/li&gt;
&lt;li&gt;PG13 added &lt;code&gt;logical_decoding_work_mem&lt;/code&gt; to adjust memory and reduce spill probability&lt;/li&gt;
&lt;li&gt;PG14+ supports streaming replication&lt;/li&gt;
&lt;li&gt;Streaming also requires certain conditions to trigger, so even with streaming, spilling can still occur&lt;/li&gt;
&lt;li&gt;PG17 added &lt;code&gt;debug_logical_replication_streaming&lt;/code&gt; to force streaming&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;WALSender Blocking Shutdown
 &lt;div id="walsender-blocking-shutdown" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#walsender-blocking-shutdown" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E6%A1%88%E4%BE%8B/PG%E5%81%9C%E5%BA%93%E9%80%BB%E8%BE%91%E5%92%8Cwalsender%E9%98%BB%E6%AD%A2%E5%81%9C%E5%BA%93%E9%97%AE%E9%A2%98%E5%88%86%E6%9E%90.md" target="_blank" rel="noreferrer"&gt;PG Shutdown Logic and WALSender Blocking Shutdown Analysis&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In reality, &lt;em&gt;any&lt;/em&gt; process that doesn&amp;rsquo;t exit can block shutdown. The question is which ones are most likely to cause trouble. From the shutdown code flow, archiver and walsender are frequent blockers because during shutdown they attempt a final archive or log transmission.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://camo.githubusercontent.com/45e44c384cdf1c41caf9d2018076cf420cd48c9d49be1b2078262b4303be2627/68747470733a2f2f6f73732d656d637370726f642d7075626c69632e6d6f64622e70726f2f696d6167652f656469746f722f32303235303130342d313837353338333238343037393830343431365f343538372e706e67" alt="" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If shutdown is stuck on walsender, try &lt;code&gt;kill&lt;/code&gt; (not &lt;code&gt;kill -9&lt;/code&gt;) — the checkpoint hasn&amp;rsquo;t finished yet, and a forced shutdown leaves an inconsistent state. Even for forced shutdown, prefer &lt;code&gt;pg_ctl stop -D $PGDATA -m i&lt;/code&gt; over raw &lt;code&gt;kill -9&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If shutdown is stuck on archiver, &lt;code&gt;kill -9&lt;/code&gt; is fine — the checkpoint is already complete and the database is in a consistent state&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Partitioned Tables
 &lt;div id="partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E5%86%85%E5%8A%9F%E4%BF%AE%E7%82%BC/PostgreSQL%E5%88%86%E5%8C%BA%E8%A1%A8.md" target="_blank" rel="noreferrer"&gt;Partitioned Table Basics&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PG&amp;rsquo;s partitioned tables have unique characteristics that developers generally don&amp;rsquo;t fully understand without study, leading to many pitfalls.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Index Mismatch Between Parent and Child Partitions
 &lt;div id="index-mismatch-between-parent-and-child-partitions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#index-mismatch-between-parent-and-child-partitions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Due to non-standard partition creation, many indexes are created directly on child tables (which should not be done), and the &amp;ldquo;create index on all children + attach&amp;rdquo; workflow is skipped. The result: the parent table has no index or no effective index. Since the parent has no data, this doesn&amp;rsquo;t directly impact queries — but when new partitions are created, they only inherit the parent&amp;rsquo;s indexes, so new child tables end up missing indexes.&lt;/p&gt;
&lt;p&gt;Fixing parent-table missing indexes is fairly straightforward: see &lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E5%86%85%E5%8A%9F%E4%BF%AE%E7%82%BC/PostgreSQL%E5%88%86%E5%8C%BA%E8%A1%A8.md#%E5%88%9B%E5%BB%BA%E5%88%86%E5%8C%BA%E7%B4%A2%E5%BC%95%E7%9A%84%E6%AD%A3%E7%A1%AE%E5%A7%BF%E5%8A%BF" target="_blank" rel="noreferrer"&gt;The Correct Way to Create Partition Indexes&lt;/a&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create an invalid index ONLY on the parent. Fast, but blocks subsequent DML — watch for long transactions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; IDX_DATECREATED &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ONLY&lt;/span&gt; lzlpartition1(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create the index CONCURRENTLY on each child partition. Slow, but doesn&amp;#39;t block DML — watch for long DML transactions that could cause the operation to fail
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; concurrently idx_datecreated_202302 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Attach all indexes. Fast, no business blocking
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_datecreated ATTACH PARTITION idx_datecreated_202302;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Fixing a missing primary key on the parent is harder: see &lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E5%86%85%E5%8A%9F%E4%BF%AE%E7%82%BC/PostgreSQL%E5%88%86%E5%8C%BA%E8%A1%A8.md#%E5%88%86%E5%8C%BA%E8%A1%A8%E6%B7%BB%E5%8A%A0%E4%B8%BB%E9%94%AE%E5%92%8C%E5%94%AF%E4%B8%80%E7%B4%A2%E5%BC%95" target="_blank" rel="noreferrer"&gt;Adding Primary Keys and Unique Indexes to Partitioned Tables&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Adding a primary key on the parent acquires &lt;code&gt;AccessExclusiveLock&lt;/code&gt;, blocking everything. Creating an index on a partitioned table is slow, and the PK then causes further blocking. There&amp;rsquo;s currently no low-impact way to add a PK on a partitioned table. Workarounds: &amp;ldquo;attach a unique index + NOT NULL constraint&amp;rdquo;, schedule extended downtime for the partition table while the index builds, or use a third-party sync tool to populate a new table that already has the PK.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Abusing the DEFAULT Partition
 &lt;div id="abusing-the-default-partition" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#abusing-the-default-partition" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E6%A1%88%E4%BE%8B/%E6%B2%A1%E6%9C%89%E9%98%BB%E5%A1%9E%E4%B8%BA%E4%BB%80%E4%B9%88partition%20of%E5%88%9B%E5%BB%BA%E5%AD%90%E5%88%86%E5%8C%BA%E5%BE%88%E6%85%A2%EF%BC%9F.md" target="_blank" rel="noreferrer"&gt;Default Partition Overgrowth Causing Prolonged Blocking During &lt;code&gt;CREATE TABLE ... PARTITION OF&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The root cause is simple: when adding a new partition, the DDL must validate that data in the DEFAULT partition doesn&amp;rsquo;t conflict with the new partition&amp;rsquo;s range. This scans a large amount of data in the DEFAULT partition, and the new partition creation never completes. Blocking then cascades — business queries and writes stall.&lt;/p&gt;
&lt;p&gt;DEFAULT partition abuse is a widespread problem! The community PG doesn&amp;rsquo;t provide interval partitioning. If a developer forgets to create a partition, data silently lands in DEFAULT with no error or alert. Day after day, the DEFAULT partition grows enormous — and then the next schema change causes an outage.&lt;/p&gt;
&lt;p&gt;You can&amp;rsquo;t leave an oversized DEFAULT partition as-is forever. Even though ATTACH can avoid the blocking problem, you still need to defuse this bomb eventually.&lt;/p&gt;
&lt;p&gt;DEFAULT partition data handling — Plan 1:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;DETACH the default partition, create proper partitions, then re-insert DEFAULT data into the partitioned table&lt;/li&gt;
&lt;li&gt;If needed, after detach and creating proper partitions, create an empty DEFAULT partition to maintain business continuity&lt;/li&gt;
&lt;li&gt;Note: DETACH (unlike ATTACH) requires an AccessExclusiveLock on the parent. PG14 supports DETACH CONCURRENTLY, but not for DEFAULT partitions&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;DEFAULT partition data handling — Plan 2:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;DETACH the default partition, create proper partitions, then ATTACH the detached DEFAULT table as a regular child partition — careful with range boundaries&lt;/li&gt;
&lt;li&gt;If needed, after detach and creating proper partitions, create an empty DEFAULT partition to maintain business continuity&lt;/li&gt;
&lt;li&gt;Note: DETACH (unlike ATTACH) requires an AccessExclusiveLock on the parent. PG14 supports DETACH CONCURRENTLY, but not for DEFAULT partitions&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;DEFAULT partition data handling — Plan 3:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create a new table, sync all data via DTS&lt;/li&gt;
&lt;li&gt;Rename tables&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Plan 3 looks the crudest, but it&amp;rsquo;s the one I personally recommend most. If you have 5 instances to fix, a surgical approach is fine. If you have 200 instances, the labor cost makes DTS the practical winner.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Missing SELECT Privileges on Partitions Causing Abnormal Plans
 &lt;div id="missing-select-privileges-on-partitions-causing-abnormal-plans" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#missing-select-privileges-on-partitions-causing-abnormal-plans" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If a user lacks SELECT privilege on a child partition, their queries can&amp;rsquo;t access that partition&amp;rsquo;s statistics, leading to bad execution plans. Partitions created via &lt;code&gt;CREATE TABLE ... PARTITION OF&lt;/code&gt; normally don&amp;rsquo;t carry SELECT grants — but data is accessible through the parent — so this is a widespread issue.&lt;/p&gt;
&lt;p&gt;Solutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Have the cloud platform handle it automatically&lt;/li&gt;
&lt;li&gt;Enforce dev standards requiring SELECT grants on child partitions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;High-Concurrency Full Partition Scans and LWLock:lockmanager
 &lt;div id="high-concurrency-full-partition-scans-and-lwlocklockmanager" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#high-concurrency-full-partition-scans-and-lwlocklockmanager" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;This is another very common problem!&lt;/p&gt;
&lt;p&gt;I recommend reading the AWS documentation, which explains it clearly: &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/wait-event.lw-lock-manager.html" target="_blank" rel="noreferrer"&gt;https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/wait-event.lw-lock-manager.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Symptoms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Spiking active sessions&lt;/li&gt;
&lt;li&gt;Severe LWLock:lockmanager wait events&lt;/li&gt;
&lt;li&gt;Database performance cliff&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Trigger conditions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Query scans multiple partitions&lt;/li&gt;
&lt;li&gt;That query has high concurrency&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Key takeaways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The fastpath lock mechanism is designed for quick access to &amp;ldquo;weak locks&amp;rdquo;, improving database concurrency&lt;/li&gt;
&lt;li&gt;fastpath works for lock levels ≤ 3 — i.e., SELECT, SELECT FOR xxx, and DML (lock modes below &lt;code&gt;ShareUpdateExclusiveLock&lt;/code&gt; — levels 1, 2, 3 can use fastpath). It&amp;rsquo;s meant to benefit normal operations&lt;/li&gt;
&lt;li&gt;&lt;code&gt;FP_LOCK_SLOTS_PER_BACKEND&lt;/code&gt;: a local process holds at most 16 fastpath locks; beyond that, it must acquire locks in shared memory, producing &lt;code&gt;LWLock:lockmanager&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Not just tables — every accessed index also acquires a lock&lt;/li&gt;
&lt;li&gt;This problem isn&amp;rsquo;t tightly coupled to partition count — even a modest number of partitions can trigger &lt;code&gt;LWLock:lockmanager&lt;/code&gt; and degrade performance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&amp;rsquo;s calculate: with a partitioned table having 1 primary key and 2 regular indexes, how many partitions exhaust the fastpath?&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;/&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; indexes &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt;) &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; parent &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; child partitions&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Yes — a full scan across just 3 partitions can already trigger LWLock:lockmanager waits.&lt;/p&gt;
&lt;p&gt;For a regular table, 16 indexes would similarly exhaust fastpath.&lt;/p&gt;
&lt;p&gt;Solutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For not-too-large tables, merge partitions into a regular table&lt;/li&gt;
&lt;li&gt;Add partition key filter conditions to queries&lt;/li&gt;
&lt;li&gt;Reduce indexes (not very practical, since partition count alone can exceed 16)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The hard part:&lt;/p&gt;
&lt;p&gt;In Oracle-to-PG migrations, Oracle supports global indexes, so primary keys and unique indexes don&amp;rsquo;t need to include the partition key. In PG, they must include the partition key.&lt;/p&gt;
&lt;p&gt;PK example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;idxlzl(primarykey) &lt;span style="color:#75715e"&gt;--oracle
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;idxlzl(primarykey,partitionkey) &lt;span style="color:#75715e"&gt;--pg&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;A common query pattern:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; col &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; primarykey&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12345&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Should you push the application to add a partition filter here? It&amp;rsquo;s a tough sell. The resistance is: &amp;ldquo;I already passed the primary key — what more do you want? If I knew everything, why would I query the database?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In this case, the only recommendation is to convert the partitioned table to a regular table. I haven&amp;rsquo;t found a better solution.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Memory
 &lt;div id="memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Excessive Objects Leading to Oversized relcache
 &lt;div id="excessive-objects-leading-to-oversized-relcache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#excessive-objects-leading-to-oversized-relcache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Key takeaways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;relcache stores relation metadata: OID, pg_class info, partitions, subtransactions, row-level security policies, statistics, index metadata, access methods, etc.&lt;/li&gt;
&lt;li&gt;Each session has its own (rel)cache for system catalog data (metadata, etc.)&lt;/li&gt;
&lt;li&gt;Normally this cache is small. When the catalog is huge and a session accesses all of it, the cache can become very large&lt;/li&gt;
&lt;li&gt;Cache management is simple: no eviction mechanism, no limit (though there are invalidation messages)&lt;/li&gt;
&lt;li&gt;Closing the session releases the cache&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Solutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reduce the number of objects — especially check whether partition child tables are excessive&lt;/li&gt;
&lt;li&gt;Set aggressive connection-pool disconnection parameters so business connections recycle more frequently&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Memory Fragmentation
 &lt;div id="memory-fragmentation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-fragmentation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Recommended commands:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo|grep whatyouneed
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/buddyinfo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## cgroup memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/opt/cgtools/cginfo -t perf -s mem
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Pay attention to pgscand/s (direct memory reclaim) — values in the tens of thousands indicate a problem&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sar -B -s &lt;span style="color:#e6db74"&gt;&amp;#34;08:00:00&amp;#34;&lt;/span&gt; -e &lt;span style="color:#e6db74"&gt;&amp;#34;09:00:00&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# min_free_kbytes setting:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/sys/vm/min_free_kbytes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Total physical memory usage of all processes:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;grep Pss /proc/&lt;span style="color:#f92672"&gt;[&lt;/span&gt;1-9&lt;span style="color:#f92672"&gt;]&lt;/span&gt;*/smaps | awk &lt;span style="color:#e6db74"&gt;&amp;#39;{total+=$2}; END {printf &amp;#34;%d kB\n&amp;#34;, total }&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# PSS memory for a specific process:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/90875/smaps |grep Pss |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# RSS memory for a specific process:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/68729/smaps |grep Rss |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Private memory for a specific process:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/90875/smaps|sed &lt;span style="color:#e6db74"&gt;&amp;#39;/zero/,/VmFlags/d&amp;#39;&lt;/span&gt; |grep Private |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;min_free_kbytes:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://camo.githubusercontent.com/ec10b5b4434febdb6675545e2beaa60646be264db9fb8259cd787cdd4771054b/68747470733a2f2f692d626c6f672e6373646e696d672e636e2f626c6f675f6d6967726174652f35653435303466323634303231633438386438613637623962333665666265322e706e67" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://vivani.net/2022/06/14/linux-kernel-tuning-page-allocation-failure/" target="_blank" rel="noreferrer"&gt;https://vivani.net/2022/06/14/linux-kernel-tuning-page-allocation-failure/&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;When free memory is low, the kswapd daemon is woken to free pages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pages_low: when free pages fall below pages_low, buddy allocator wakes kswapd and the kernel begins swapping pages to disk&lt;/li&gt;
&lt;li&gt;pages_min: when free pages reach pages_min, reclamation pressure is high — the zone urgently needs free pages. The allocator performs synchronous kswapd work, sometimes called direct reclaim&lt;/li&gt;
&lt;li&gt;pages_high: once kswapd is awake and freeing pages, the kernel considers the zone &amp;ldquo;balanced&amp;rdquo; only when free pages reach pages_high. At pages_high, kswapd goes back to sleep. Free pages above pages_high means the zone is in an ideal state&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;vm.min_free_kbytes&lt;/code&gt; (the pages_min watermark) is an extremely important OS parameter. Too low a value prevents effective memory reclamation, potentially causing system crashes and service interruptions. Too high a value increases reclaim activity, causing allocation delays that can immediately trigger OOM.&lt;/p&gt;
&lt;p&gt;Optimization results:&lt;/p&gt;
&lt;p&gt;After increasing &lt;code&gt;min_free_kbytes&lt;/code&gt; + deploying off-peak drop-cache jobs, problems have decreased significantly.&lt;/p&gt;
&lt;p&gt;Why increase min_free_kbytes?&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;This is used to force the Linux VM to keep a minimum number of kilobytes free. The VM uses this number to compute a watermark[WMARK_MIN] value for each lowmem zone in the system. Each lowmem zone gets a number of reserved free pages based &lt;strong&gt;proportionally&lt;/strong&gt; on its size.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;&lt;a href="https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html#min-free-kbytes" target="_blank" rel="noreferrer"&gt;Source: kernel.org docs&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The point of raising min_free_kbytes isn&amp;rsquo;t to raise the min watermark and trigger direct reclaim more often — it&amp;rsquo;s because the low watermark couldn&amp;rsquo;t be tuned before Linux 7. The only way to raise low proportionally was to raise min, making asynchronous reclaim trigger earlier and giving direct reclaim a buffer window.&lt;/p&gt;
&lt;p&gt;Red Hat 8 added two memory parameters to improve reclaim: &lt;code&gt;watermark_scale_factor&lt;/code&gt; can raise watermarks without touching &lt;code&gt;min_free_kbytes&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Recommend enabling huge pages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Huge pages perform better when PG requests contiguous memory&lt;/li&gt;
&lt;li&gt;Huge pages also help reduce page cache size&lt;/li&gt;
&lt;li&gt;shared_buffers can use huge pages; requires &lt;code&gt;Huge_pages=on&lt;/code&gt; and OS-level huge pages enabled&lt;/li&gt;
&lt;li&gt;Instances with huge pages enabled in production show better performance and fewer problems&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Managing.html#AuroraPostgreSQL.Managing.HugePages" target="_blank" rel="noreferrer"&gt;AWS huge pages standard&lt;/a&gt;: enabled by default for all instance classes except certain test tiers, and cannot be disabled&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;code&gt;Huge_pages&lt;/code&gt; parameter is turned on by default for all DB instance classes other than t3.medium, db.t3.large, db.t4g.medium, db.t4g.large instance classes. You can&amp;rsquo;t change the &lt;code&gt;huge_pages&lt;/code&gt; parameter value or turn off this feature in the supported instance classes of Aurora PostgreSQL.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 class="relative group"&gt;cgroup and Host Memory Mismatch
 &lt;div id="cgroup-and-host-memory-mismatch" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cgroup-and-host-memory-mismatch" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When cgroup memory hits its limit, kswapd prioritizes reclaiming pages within the cgroup. With cloud VM instance types and cgroup configurations, the host may have free memory above watermarks while the cgroup is under pressure. The host-level pages_low doesn&amp;rsquo;t trigger asynchronous reclaim for either host or cgroup memory. Eventually, direct reclaim fires to satisfy the cgroup&amp;rsquo;s DB memory demand.&lt;/p&gt;
&lt;p&gt;The root cause: cgroups lack independent free-page memory management.&lt;/p&gt;
&lt;p&gt;The only fix: increase the cgroup memory limit, overcommitting the host more aggressively so the host reaches pages_low sooner.&lt;/p&gt;

&lt;h3 class="relative group"&gt;shared_buffer and pagecache
 &lt;div id="shared_buffer-and-pagecache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shared_buffer-and-pagecache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PG uses a double-buffer mechanism — no direct IO yet.&lt;/p&gt;
&lt;p&gt;Double buffer: DB shared_buffers (one layer of shared memory) + OS pagecache (another layer). In real deployments, pagecache is typically far larger than shared_buffers. And pagecache counts against cgroup mem but isn&amp;rsquo;t reflected in cgroup memory monitoring&amp;hellip;&lt;/p&gt;
&lt;p&gt;Bottom line: leave plenty of memory for pagecache. Don&amp;rsquo;t make shared_buffers excessively large (20GB seems sufficient for most cases). Only increase it if you clearly observe buffer-mapping-related wait events.&lt;/p&gt;

&lt;h3 class="relative group"&gt;work_mem Cannot Cap Hash Join / Hash Aggregate Memory
 &lt;div id="work_mem-cannot-cap-hash-join--hash-aggregate-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#work_mem-cannot-cap-hash-join--hash-aggregate-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;hash_mem_multiplier&lt;/strong&gt; limits memory for hash-based operations (hash join, hash agg, etc.), capping at &lt;code&gt;hash_mem_multiplier * work_mem&lt;/code&gt;. The default is 2.&lt;/p&gt;
&lt;p&gt;Before PG13, &lt;code&gt;work_mem&lt;/code&gt; was tunable, but there was no way to limit how many hash operations a single query could use. PG13 added this multiplier. In other words, pre-13, it was very hard to cap hash-table memory.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;In a PG12- production environment, I found a single session consuming 300GB of memory — the culprit was the lack of hash-table limits combined with a plan that incorrectly used hash tables.&lt;/em&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Other Issues
 &lt;div id="other-issues" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#other-issues" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Exclusive Backup and Startup Issues
 &lt;div id="exclusive-backup-and-startup-issues" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#exclusive-backup-and-startup-issues" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Normally, when the database stops and restarts, the startup position comes from &lt;code&gt;pg_controldata&lt;/code&gt;&amp;rsquo;s LSN. But if there&amp;rsquo;s a &lt;code&gt;backup_label&lt;/code&gt; file in PGDATA, the startup LSN is read from &lt;code&gt;backup_label&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;What problems does this cause?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Disk snapshots taken directly on the data directory may include the label file. If the database is large and the backup took a long time, restart can be very slow&lt;/li&gt;
&lt;li&gt;Bigger problem: after a production shutdown from certain causes, restart takes forever. The root cause is the startup LSN coming from the backup rather than controldata&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Version changes:&lt;/p&gt;
&lt;p&gt;PG13:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pg_start_backup()&lt;/code&gt;
&lt;code&gt;pg_stop_backup()&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Supports exclusive and non-exclusive modes; exclusive is the default. Exclusive mode creates &lt;code&gt;backup_label&lt;/code&gt; in the data directory at start and cleans it at stop. Non-exclusive mode doesn&amp;rsquo;t create the label at start; it returns the label info at stop.&lt;/p&gt;
&lt;p&gt;PG15:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pg_backup_start()&lt;/code&gt;
&lt;code&gt;pg_backup_stop()&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Function names changed, and &lt;strong&gt;exclusive backup mode was removed&lt;/strong&gt;. No &lt;code&gt;backup_label&lt;/code&gt; is written at backup start; instead it&amp;rsquo;s written to the backup area at backup stop.&lt;/p&gt;

&lt;h3 class="relative group"&gt;pg_stat_activity Unqueryable
 &lt;div id="pg_stat_activity-unqueryable" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_stat_activity-unqueryable" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Symptom:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pg_stat_activity&lt;/code&gt; hangs and can&amp;rsquo;t be queried.&lt;/p&gt;
&lt;p&gt;pstack at the time:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; pgstat_read_current_status () &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; pgstat.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3642&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000000000727181 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; pgstat_read_current_status () &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; pgstat.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2788&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; pgstat_fetch_stat_numbackends () &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; pgstat.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2789&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000083f2ee &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; pg_stat_get_activity (fcinfo&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x25c2d98) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; pgstatfuncs.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;575&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000065058f &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ExecMakeTableFunctionResult (setexpr&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x25b1d28, econtext&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x25b1c48, argContext&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, expectedDesc&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2545218, randomAccess&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;false&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; execSRF.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;234&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00000000006609dc &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; FunctionNext (node&lt;span style="color:#f92672"&gt;=&lt;/span&gt;node&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x25b1b38) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; nodeFunctionscan.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;94&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000065110c &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ExecScanFetch (recheckMtd&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x660700 &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;FunctionRecheck&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, accessMtd&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x660720 &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;FunctionNext&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, node&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x25b1b38) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; execScan.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;133&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Analysis:&lt;/p&gt;
&lt;p&gt;The code location is clear — stuck in an infinite loop after &lt;code&gt;st_changecount&lt;/code&gt; becomes odd.&lt;/p&gt;
&lt;p&gt;Triggers: OOM (reproducible), abnormal backend exit (possible), terminate (maybe). None of these guarantee the issue, though.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/message-id/5979.1557543440%40sss.pgh.pa.us" target="_blank" rel="noreferrer"&gt;Community thread&lt;/a&gt; didn&amp;rsquo;t reach a conclusion. Currently the trigger probability appears low.&lt;/p&gt;
&lt;p&gt;Solution: restart the database.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Connection and Connection Pooling Issues
 &lt;div id="connection-and-connection-pooling-issues" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#connection-and-connection-pooling-issues" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;IO Error Messages
 &lt;div id="io-error-messages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#io-error-messages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;IO errors typically mean the application is using a connection that&amp;rsquo;s already been closed. This happens often, and diagnosing it is difficult because the entire chain involves many components and broad domain knowledge. Here&amp;rsquo;s a brief summary.&lt;/p&gt;
&lt;p&gt;Known active-disconnection scenarios:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;hikari &lt;code&gt;maxLifetime&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Symptom: session lifetime matches the parameter. Possible cause: the application holds an explicit transaction with an uncommitted SELECT, the pool closes the session, and the app gets &lt;code&gt;io error; could not rollback&lt;/code&gt; or similar.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg.datasouce.maxLifetime&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;druid timeout&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Symptom: connection drops after SQL execution exceeds 20s.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;spring.datasource.dynamic.druid.socketTimeout=20000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;spring.datasource.dynamic.druid.connectTimeout=20000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Change to:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;spring.datasource.socketTimeout=3600000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;spring.datasource.connectTimeout=3600000&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Application Horizontal Scaling vs. Database Connection Limits
 &lt;div id="application-horizontal-scaling-vs-database-connection-limits" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#application-horizontal-scaling-vs-database-connection-limits" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Horizontal application scaling meets PG connection bottlenecks:&lt;/p&gt;
&lt;p&gt;HikariCP is now Spring Boot&amp;rsquo;s default connection pool. With the proliferation of Spring Boot and microservices, HikariCP usage is widespread. Every pod scaled out increases database connection count. The &lt;code&gt;maximumPoolSize&lt;/code&gt; stays the same per pod, but more nodes mean more total connections. From existing node count, added node count, and current total connections, you can proportionally calculate how many idle connections will be added.&lt;/p&gt;
&lt;p&gt;Applications can scale horizontally without state, but databases cannot. PG&amp;rsquo;s connection limit is &lt;code&gt;max_connections&lt;/code&gt;. Unchecked application scaling can saturate idle connections. Tuning &lt;code&gt;max_connections&lt;/code&gt; is painful because it requires a database restart.&lt;/p&gt;
&lt;p&gt;PG connection upper limit:&lt;/p&gt;
&lt;p&gt;Also, even with unlimited horizontal scaling, &lt;code&gt;max_connections&lt;/code&gt; should adjust with instance class — but there&amp;rsquo;s a real ceiling. In any database, idle connections degrade performance as they increase.&lt;/p&gt;
&lt;p&gt;Refer to &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Managing.html#AuroraPostgreSQL.Managing.MaxConnections" target="_blank" rel="noreferrer"&gt;AWS&amp;rsquo;s approach&lt;/a&gt;:
&lt;code&gt;max_connections&lt;/code&gt; is tied to instance class, with a maximum of &lt;code&gt;5000, LEAST({DBInstanceClassMemory/9531392}, 5000)&lt;/code&gt;. This reduces manual connection ops and provides a reasonable ceiling.&lt;/p&gt;</content:encoded></item><item><title>PG Shutdown Logic and Walsender Blocking Shutdown Analysis</title><link>https://lastdba.com/en/2025/01/04/pg-shutdown-logic-and-walsender-blocking-shutdown-analysis/</link><pubDate>Sat, 04 Jan 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/01/04/pg-shutdown-logic-and-walsender-blocking-shutdown-analysis/</guid><description>&lt;h2 class="relative group"&gt;Walsender Blocking Shutdown Symptoms
 &lt;div id="walsender-blocking-shutdown-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#walsender-blocking-shutdown-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Production shutdown log output:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:00:02.036 CST,,,447560,,65693cde.6d448,1320,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;received fast shutdown request&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:00:02.295 CST,,,447560,,65693cde.6d448,1322,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;background worker &amp;#34;&amp;#34;logical replication launcher&amp;#34;&amp;#34; (PID 448996) exited with exit code 1&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:00:10.627 CST,,,448990,,65693ce0.6d9de,213833,,2023-12-01 09:54:40 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;checkpoint complete: wrote 426844 buffers (5.1%); 0 WAL file(s) added, 0 removed, 5 recycled; write=91.427 s, sync=0.055 s, total=91.508 s; sync files=761, longest=0.028 s, average=0.001 s; distance=2197531 kB, estimate=2680783 kB&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;checkpointer&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:00:10.628 CST,,,448990,,65693ce0.6d9de,213834,,2023-12-01 09:54:40 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;shutting down&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;checkpointer&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--checkpointer finished checkpoint and is in shutting down state, pm has not exited
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--160s later pm receives immediate shutdown, triggered by health check script
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.348 CST,,,447560,,65693cde.6d448,1323,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;received immediate shutdown request&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.370 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,283840,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:39865&amp;#34;&lt;/span&gt;,6751a2dc.454c0,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-12-05 20:55:56 CST,89/847309655,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.370 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,157641,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:39407&amp;#34;&lt;/span&gt;,67408354.267c9,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-11-22 21:12:52 CST,9/3193590104,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.370 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,157916,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:57038&amp;#34;&lt;/span&gt;,67408356.268dc,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-11-22 21:12:54 CST,115/3293293502,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.370 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,164392,&lt;span style="color:#e6db74"&gt;&amp;#34;30.151.40.19:41641&amp;#34;&lt;/span&gt;,66b25869.28228,3,&lt;span style="color:#e6db74"&gt;&amp;#34;streaming 42D3B/1732C5F0&amp;#34;&lt;/span&gt;,2024-08-07 01:07:53 CST,296/0,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;standby_6666&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.371 CST,,,447560,,65693cde.6d448,1324,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;archiver process (PID 448994) exited with exit code 2&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.371 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,57755,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:38918&amp;#34;&lt;/span&gt;,67125534.e19b,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-10-18 20:31:48 CST,243/902018192,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.372 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,157915,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:43433&amp;#34;&lt;/span&gt;,67408356.268db,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-11-22 21:12:54 CST,60/3248014863,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--pm finished shutting down
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:57.534 CST,,,447560,,65693cde.6d448,1325,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;database system is shut down&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:03:49.536 CST,,,211844,,6752bdf3.33b84,1,,2024-12-06 17:03:47 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;ending log output to stderr&amp;#34;&lt;/span&gt;,,&lt;span style="color:#e6db74"&gt;&amp;#34;Future log output will go to log destination &amp;#34;&amp;#34;csvlog&amp;#34;&amp;#34;.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;17:00:02 postmaster receives fast shutdown&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Walsender Blocking Shutdown Symptoms
 &lt;div id="walsender-blocking-shutdown-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#walsender-blocking-shutdown-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Production shutdown log output:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:00:02.036 CST,,,447560,,65693cde.6d448,1320,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;received fast shutdown request&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:00:02.295 CST,,,447560,,65693cde.6d448,1322,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;background worker &amp;#34;&amp;#34;logical replication launcher&amp;#34;&amp;#34; (PID 448996) exited with exit code 1&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:00:10.627 CST,,,448990,,65693ce0.6d9de,213833,,2023-12-01 09:54:40 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;checkpoint complete: wrote 426844 buffers (5.1%); 0 WAL file(s) added, 0 removed, 5 recycled; write=91.427 s, sync=0.055 s, total=91.508 s; sync files=761, longest=0.028 s, average=0.001 s; distance=2197531 kB, estimate=2680783 kB&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;checkpointer&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:00:10.628 CST,,,448990,,65693ce0.6d9de,213834,,2023-12-01 09:54:40 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;shutting down&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;checkpointer&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--checkpointer finished checkpoint and is in shutting down state, pm has not exited
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--160s later pm receives immediate shutdown, triggered by health check script
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.348 CST,,,447560,,65693cde.6d448,1323,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;received immediate shutdown request&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.370 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,283840,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:39865&amp;#34;&lt;/span&gt;,6751a2dc.454c0,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-12-05 20:55:56 CST,89/847309655,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.370 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,157641,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:39407&amp;#34;&lt;/span&gt;,67408354.267c9,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-11-22 21:12:52 CST,9/3193590104,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.370 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,157916,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:57038&amp;#34;&lt;/span&gt;,67408356.268dc,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-11-22 21:12:54 CST,115/3293293502,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.370 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,164392,&lt;span style="color:#e6db74"&gt;&amp;#34;30.151.40.19:41641&amp;#34;&lt;/span&gt;,66b25869.28228,3,&lt;span style="color:#e6db74"&gt;&amp;#34;streaming 42D3B/1732C5F0&amp;#34;&lt;/span&gt;,2024-08-07 01:07:53 CST,296/0,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;standby_6666&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.371 CST,,,447560,,65693cde.6d448,1324,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;archiver process (PID 448994) exited with exit code 2&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.371 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,57755,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:38918&amp;#34;&lt;/span&gt;,67125534.e19b,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-10-18 20:31:48 CST,243/902018192,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.372 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,157915,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:43433&amp;#34;&lt;/span&gt;,67408356.268db,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-11-22 21:12:54 CST,60/3248014863,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--pm finished shutting down
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:57.534 CST,,,447560,,65693cde.6d448,1325,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;database system is shut down&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:03:49.536 CST,,,211844,,6752bdf3.33b84,1,,2024-12-06 17:03:47 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;ending log output to stderr&amp;#34;&lt;/span&gt;,,&lt;span style="color:#e6db74"&gt;&amp;#34;Future log output will go to log destination &amp;#34;&amp;#34;csvlog&amp;#34;&amp;#34;.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;17:00:02 postmaster receives fast shutdown&lt;/p&gt;
&lt;p&gt;17:00:10 checkpoint completed, checkpointer stopped&lt;/p&gt;
&lt;p&gt;17:02:43 postmaster receives immediate shutdown&lt;/p&gt;
&lt;p&gt;17:02:43 1 physical and 5 logical replication walsenders stopped&lt;/p&gt;
&lt;p&gt;17:02:57 postmaster stopped&lt;/p&gt;
&lt;p&gt;17:03:49 postmaster receives startup task&lt;/p&gt;
&lt;p&gt;From the above, it&amp;rsquo;s clear that walsender was blocking the shutdown.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Shutdown and Signals
 &lt;div id="shutdown-and-signals" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shutdown-and-signals" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Before diving into source code, we need to understand signals and signal registration in PG.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Common Signals in PG
 &lt;div id="common-signals-in-pg" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#common-signals-in-pg" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;OS signals:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ kill -l
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 1&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGHUP 2&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGINT 3&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGQUIT 4&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGILL 5&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGTRAP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 6&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGABRT 7&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGBUS 8&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGFPE 9&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGKILL 10&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGUSR1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;11&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGSEGV 12&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGUSR2 13&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGPIPE 14&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGALRM 15&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGTERM
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;16&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGSTKFLT 17&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGCHLD 18&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGCONT 19&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGSTOP 20&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGTSTP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Common signals used in PG:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;-1&lt;/code&gt; or &lt;code&gt;-SIGHUP&lt;/code&gt;: Hangup signal. In PG, typically tells the process to reload configuration.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-2&lt;/code&gt; or &lt;code&gt;-SIGINT&lt;/code&gt;: Interrupt signal (usually &lt;code&gt;Ctrl+C&lt;/code&gt;). In PG, usually corresponds to cancel command.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-3&lt;/code&gt; or &lt;code&gt;-SIGQUIT&lt;/code&gt;: In PG, usually means forced exit (die).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-9&lt;/code&gt; or &lt;code&gt;-SIGKILL&lt;/code&gt;: Unconditional termination signal.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-15&lt;/code&gt; or &lt;code&gt;-SIGTERM&lt;/code&gt;: Termination signal, the signal used by &lt;code&gt;pg_terminate_backend&lt;/code&gt;. In PG, usually means graceful exit.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-10&lt;/code&gt; or &lt;code&gt;-SIGUSR1&lt;/code&gt;: Custom signal.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-12&lt;/code&gt; or &lt;code&gt;-SIGUSR2&lt;/code&gt;: Custom signal.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-17&lt;/code&gt; or &lt;code&gt;SIGCHLD&lt;/code&gt;: Signal used by the pm process. When a child process exits, pm receives this signal to trigger child process reaping.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The specific meaning of signals registered by each type of PG process can be found by reading the respective process source code.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Shutdown Defined by pg_ctl
 &lt;div id="shutdown-defined-by-pg_ctl" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shutdown-defined-by-pg_ctl" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;There are several ways to shut down a PG database. At the bottom level, they all boil down to sending a signal to the postmaster process.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;signal&lt;/th&gt;
 &lt;th&gt;pg_ctl&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;SIGTERM&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;Smart Shutdown&lt;/em&gt;&lt;/td&gt;
 &lt;td&gt;Disallow new connections, but allow existing sessions to finish their work normally. Only shuts down after all sessions terminate.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;SIGINT&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;Fast Shutdown&lt;/em&gt;&lt;/td&gt;
 &lt;td&gt;Server disallows new connections and sends &lt;strong&gt;SIGTERM&lt;/strong&gt; to all existing child processes, aborting current transactions and exiting quickly. Waits for almost all child processes (some are not needed) to exit, then shuts down.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;SIGQUIT&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;Immediate Shutdown&lt;/em&gt;&lt;/td&gt;
 &lt;td&gt;Sends &lt;strong&gt;SIGQUIT&lt;/strong&gt; to all child processes and waits for them to terminate. If any child process has not terminated within 5 seconds, they are sent &lt;strong&gt;SIGKILL&lt;/strong&gt;.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Note: &lt;code&gt;pg_ctl&lt;/code&gt; has no parameter for sending &lt;code&gt;SIGKILL&lt;/code&gt; (&lt;code&gt;kill -9&lt;/code&gt;), but you can send &lt;code&gt;SIGKILL&lt;/code&gt; directly to pm — though it&amp;rsquo;s definitely not recommended. When sending &lt;code&gt;SIGKILL&lt;/code&gt; to pm, pm won&amp;rsquo;t do any cleanup of child processes, shared memory, or semaphores. Since &lt;code&gt;SIGQUIT&lt;/code&gt; to pm has fallback logic for &lt;code&gt;SIGKILL&lt;/code&gt;-ing child processes, &lt;code&gt;SIGQUIT&lt;/code&gt; to pm basically guarantees pm will stop.&lt;/p&gt;
&lt;p&gt;In the source code, there are only 3 &lt;strong&gt;shutdown states&lt;/strong&gt;, corresponding to shutdown modes:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Startup/shutdown state */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define			NoShutdown		0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define			SmartShutdown	1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define			FastShutdown	2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define			ImmediateShutdown	3&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;These states appear frequently in shutdown routine source code, generally checked via the &lt;code&gt;Shutdown&lt;/code&gt; variable:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shutdown &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; FastShutdown&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;pm Signals
 &lt;div id="pm-signals" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pm-signals" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When pm receives the corresponding signal, it handles it accordingly:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;PostmasterMain&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; argc, &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;argv[])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGHUP, SIGHUP_handler);	&lt;span style="color:#75715e"&gt;/* reread config file and have
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;											 * children do same */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGINT, pmdie); &lt;span style="color:#75715e"&gt;/* send SIGTERM and shut down */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGQUIT, pmdie);	&lt;span style="color:#75715e"&gt;/* send SIGQUIT and die */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGTERM, pmdie);	&lt;span style="color:#75715e"&gt;/* wait for children and shut down */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGALRM, SIG_IGN);	&lt;span style="color:#75715e"&gt;/* ignored */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGPIPE, SIG_IGN);	&lt;span style="color:#75715e"&gt;/* ignored */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGUSR1, sigusr1_handler);	&lt;span style="color:#75715e"&gt;/* message from child process */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGUSR2, dummy_handler);	&lt;span style="color:#75715e"&gt;/* unused, reserve for children */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGCHLD, reaper);	&lt;span style="color:#75715e"&gt;/* handle child termination */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pmdie&lt;/code&gt;: The three shutdown signals call the &lt;code&gt;pmdie&lt;/code&gt; function. &lt;code&gt;pmdie&lt;/code&gt; is the key shutdown function, analyzed in detail below.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;reaper&lt;/code&gt;: During shutdown, handles child process exit cleanup. When a child process exits, it sends &lt;code&gt;SIGCHLD&lt;/code&gt; to pm, which enters &lt;code&gt;reaper&lt;/code&gt; to clean up the child. Each child process cleanup has its own logic — for instance, normal exit of the checkpointer process checks whether archiver and walsender have completed their respective tasks.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sigusr1&lt;/code&gt;, &lt;code&gt;sigusr2&lt;/code&gt;: &lt;code&gt;sigusr1_handler&lt;/code&gt; is the standard routine for &lt;code&gt;SIGUSR1&lt;/code&gt;. Each child process handles &lt;code&gt;SIGUSR1&lt;/code&gt; differently. &lt;code&gt;SIGUSR2&lt;/code&gt; is entirely custom per child process; some child processes don&amp;rsquo;t even register this signal.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Walsender Signals
 &lt;div id="walsender-signals" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#walsender-signals" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When a child process is forked, it first registers signals.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;WalSndSignals&lt;/code&gt; registers signals for the walsender process:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Set up signal handlers */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;WalSndSignals&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Set up signal handlers */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGHUP, SignalHandlerForConfigReload);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGINT, StatementCancelHandler);	&lt;span style="color:#75715e"&gt;/* query cancel */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGTERM, die);		&lt;span style="color:#75715e"&gt;/* request shutdown */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGQUIT, quickdie);	&lt;span style="color:#75715e"&gt;/* hard crash time */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;InitializeTimeouts&lt;/span&gt;();		&lt;span style="color:#75715e"&gt;/* establishes SIGALRM handler */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGPIPE, SIG_IGN);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGUSR1, procsignal_sigusr1_handler);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGUSR2, WalSndLastCycleHandler);	&lt;span style="color:#75715e"&gt;/* request a last cycle and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;												 * shutdown */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note &lt;code&gt;SIGUSR1&lt;/code&gt; and &lt;code&gt;SIGUSR2&lt;/code&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Checkpointer Signals
 &lt;div id="checkpointer-signals" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#checkpointer-signals" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;CheckpointerMain&lt;/code&gt; registers checkpointer signals:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;CheckpointerMain&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//checkpointer blocks SIGTERM, the actual stop signal is SIGUSR2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGHUP, SignalHandlerForConfigReload);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGINT, ReqCheckpointHandler); &lt;span style="color:#75715e"&gt;/* request checkpoint */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGTERM, SIG_IGN); &lt;span style="color:#75715e"&gt;/* ignore SIGTERM */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGQUIT, SignalHandlerForCrashExit);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGALRM, SIG_IGN);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGPIPE, SIG_IGN);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGUSR1, procsignal_sigusr1_handler);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGUSR2, SignalHandlerForShutdownRequest);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note &lt;code&gt;SIGUSR1&lt;/code&gt; and &lt;code&gt;SIGUSR2&lt;/code&gt;, and also note that checkpointer does not register &lt;code&gt;SIGTERM&lt;/code&gt;.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Shutdown Source Code Analysis
 &lt;div id="shutdown-source-code-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shutdown-source-code-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;pm Signal Handling and State Machine
 &lt;div id="pm-signal-handling-and-state-machine" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pm-signal-handling-and-state-machine" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;pmdie&lt;/code&gt; function handles different postmaster signals, including &lt;code&gt;SIGCHLD&lt;/code&gt; sent by child processes to pm and shutdown signals sent by &lt;code&gt;pg_ctl&lt;/code&gt;. The main logic of pm signal handling is converting the signal into a &lt;code&gt;pmState&lt;/code&gt; state machine state transition, then entering &lt;code&gt;PostmasterStateMachine&lt;/code&gt; for processing.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pmdie&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * pmdie -- signal handler for processing various postmaster signals.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;pmdie&lt;/span&gt;(SIGNAL_ARGS)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			save_errno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; errno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (postgres_signal_arg)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; SIGTERM:&lt;span style="color:#75715e"&gt;//Smart Shutdown
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_RUN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				connsAllowed &lt;span style="color:#f92672"&gt;=&lt;/span&gt; ALLOW_SUPERUSER_CONNS;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//smart shutdown does not process pmstate, hands directly to state machine
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;//at this point normal pmState = PM_RUN
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;PostmasterStateMachine&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; SIGINT:&lt;span style="color:#75715e"&gt;//Fast Shutdown
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_RUN &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_HOT_STANDBY)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* Report that we&amp;#39;re about to zap live client sessions */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;aborting any active transactions&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_STOP_BACKENDS;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;//Fast Shutdown transitions pmstate to PM_STOP_BACKENDS
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//then hands to state machine
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;PostmasterStateMachine&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; SIGQUIT:&lt;span style="color:#75715e"&gt;//Immediate Shutdown
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;TerminateChildren&lt;/span&gt;(SIGQUIT);&lt;span style="color:#75715e"&gt;//abort all children with SIGQUIT, wait for them to exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_WAIT_BACKENDS;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* set stopwatch for them to die */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			AbortStartTime &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;time&lt;/span&gt;(NULL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Immediate Shutdown transitions pmstate to PM_WAIT_BACKENDS
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//process children before entering state machine
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//first interrupt children with SIGQUIT, wait for them to exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//then use SIGKILL on remaining children
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//finally non-consistent exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;PostmasterStateMachine&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Before entering the state machine handler, let&amp;rsquo;s look at the postmaster states:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;enum&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_INIT,					&lt;span style="color:#75715e"&gt;/* postmaster starting */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_STARTUP,					&lt;span style="color:#75715e"&gt;/* waiting for startup subprocess */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_RECOVERY,				&lt;span style="color:#75715e"&gt;/* in archive recovery mode */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_HOT_STANDBY,				&lt;span style="color:#75715e"&gt;/* in hot standby mode */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_RUN,						&lt;span style="color:#75715e"&gt;/* normal &amp;#34;database is alive&amp;#34; state */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_STOP_BACKENDS,			&lt;span style="color:#75715e"&gt;/* need to stop remaining backends */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_WAIT_BACKENDS,			&lt;span style="color:#75715e"&gt;/* waiting for live backends to exit */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_SHUTDOWN,				&lt;span style="color:#75715e"&gt;/* waiting for checkpointer to do shutdown
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;								 * ckpt */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_SHUTDOWN_2,				&lt;span style="color:#75715e"&gt;/* waiting for archiver and walsenders to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;								 * finish */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_WAIT_DEAD_END,			&lt;span style="color:#75715e"&gt;/* waiting for dead_end children to exit */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_NO_CHILDREN				&lt;span style="color:#75715e"&gt;/* all important children have exited */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} PMState;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since shutdown normally happens from the running state, we only need to focus on states at &lt;code&gt;PM_RUN&lt;/code&gt; and below.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;PostmasterStateMachine&lt;/code&gt; execution has a sequential logic:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Advance the postmaster&amp;#39;s state machine and take actions as appropriate
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * This is common code for pmdie(), reaper() and sigusr1_handler(), which
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * receive the signals that might mean we need to change state.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;PostmasterStateMachine&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;//smart shutdown, pmState should be PM_RUN at this point
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_RUN &lt;span style="color:#f92672"&gt;||&lt;/span&gt; pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_HOT_STANDBY)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (connsAllowed &lt;span style="color:#f92672"&gt;==&lt;/span&gt; ALLOW_NO_CONNS)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//After all normal backends exit, transition pmState to PM_STOP_BACKENDS
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;CountChildren&lt;/span&gt;(BACKEND_TYPE_NORMAL) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_STOP_BACKENDS;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//PM_STOP_BACKENDS stops some core child processes, some will continue running
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//autovacuum, bgwriter, walwriter, startup, walreceiver will stop
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//walsender, checkpointer, archiver, stats, and syslogger will keep running
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//smart shutdown later phase enters this logic, fast shutdown enters directly
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_STOP_BACKENDS)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;//Note this line about walsender!
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Signal all backend children except walsenders */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SignalSomeChildren&lt;/span&gt;(SIGTERM,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 BACKEND_TYPE_ALL &lt;span style="color:#f92672"&gt;-&lt;/span&gt; BACKEND_TYPE_WALSND);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* and the autovac launcher too */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (AutoVacPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(AutoVacPID, SIGTERM);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* and the bgwriter too */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (BgWriterPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(BgWriterPID, SIGTERM);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* and the walwriter too */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (WalWriterPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(WalWriterPID, SIGTERM);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* If we&amp;#39;re in recovery, also stop startup and walreceiver procs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (StartupPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(StartupPID, SIGTERM);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (WalReceiverPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(WalReceiverPID, SIGTERM);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* checkpointer, archiver, stats, and syslogger may continue for now */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Transition pmState from PM_STOP_BACKENDS to PM_WAIT_BACKEND
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//PM_WAIT_BACKEND means waiting for backends to exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_WAIT_BACKENDS;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If we are in a state-machine state that implies waiting for backends to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * exit, see if they&amp;#39;re all gone, and change state if so.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//smart shutdown, fast shutdown later phase enters this logic
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//immediate shutdown when entering state machine, directly enters this logic
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_WAIT_BACKENDS)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//During crash recovery and immediate shutdown, checkpointer needs proper exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//archiver, stats, and syslogger don&amp;#39;t need handling since they don&amp;#39;t touch shared memory
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Walsenders also don&amp;#39;t need handling; they exit after checkpoint record is written, just like archiver
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;CountChildren&lt;/span&gt;(BACKEND_TYPE_ALL &lt;span style="color:#f92672"&gt;-&lt;/span&gt; BACKEND_TYPE_WALSND) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			StartupPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			WalReceiverPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			BgWriterPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			(CheckpointerPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;FatalError &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; Shutdown &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; ImmediateShutdown)) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			WalWriterPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			AutoVacPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (Shutdown &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; ImmediateShutdown &lt;span style="color:#f92672"&gt;||&lt;/span&gt; FatalError)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;//ImmediateShutdown waits for dead end processes to finish
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_WAIT_DEAD_END;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * We already SIGQUIT&amp;#39;d the archiver and stats processes, if
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * any, when we started immediate shutdown or entered
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * FatalError state.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//smart, fast shutdown goes here
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//regular child processes have all exited, now notify checkpointer to do shutdown checkpoint
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(Shutdown &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; NoShutdown);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;//If checkpointer process doesn&amp;#39;t exist, start one
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (CheckpointerPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					CheckpointerPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;StartCheckpointer&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* And tell it to shut down */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (CheckpointerPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Send SIGUSR2 to Checkpointer
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//pmState = PM_SHUTDOWN
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(CheckpointerPID, SIGUSR2);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_SHUTDOWN;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Failing to start Checkpointer is a serious problem
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					FatalError &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_WAIT_DEAD_END;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#75715e"&gt;/* Kill the walsenders, archiver and stats collector too */&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Comment says kill walsender, but it actually doesn&amp;#39;t; at least not via SIGQUIT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;SignalChildren&lt;/span&gt;(SIGQUIT);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (PgArchPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(PgArchPID, SIGQUIT);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (PgStatPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(PgStatPID, SIGQUIT);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//The pmdie function and state machine function won&amp;#39;t create PM_SHUTDOWN_2 state, but reaper will
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//When reaper handles checkpointer exit, it sets pmState = PM_SHUTDOWN_2; at the end of reaper, it enters the state machine function, which is here
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_SHUTDOWN_2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * PM_SHUTDOWN_2 state ends when there&amp;#39;s no other children than
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * dead_end children left. There shouldn&amp;#39;t be any regular backends
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * left by now anyway; what we&amp;#39;re really waiting for is walsenders and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * archiver.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//PM_SHUTDOWN_2 essentially waits for walsender and archiver
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//only changes pmState
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (PgArchPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;CountChildren&lt;/span&gt;(BACKEND_TYPE_ALL) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_WAIT_DEAD_END;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_WAIT_DEAD_END)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//PM_WAIT_DEAD_END means BackendList is completely empty
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;dlist_is_empty&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;BackendList) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			PgArchPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; PgStatPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* These other guys should be dead already */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(StartupPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(WalReceiverPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(BgWriterPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(CheckpointerPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(WalWriterPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(AutoVacPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* syslogger is not considered here */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_NO_CHILDREN;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//PM_NO_CHILDREN is the last shutdown state, meaning normal shutdown can proceed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (Shutdown &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; NoShutdown &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_NO_CHILDREN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (FatalError)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG, (&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;abnormal database system shutdown&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Abnormal pm exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ExitPostmaster&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		 	&lt;span style="color:#75715e"&gt;//Normal pm exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ExitPostmaster&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;reaper&lt;/code&gt; is the process reaping function. When a child process exits, it sends &lt;code&gt;SIGCHLD&lt;/code&gt; to pm, and pm cleans up the process via the &lt;code&gt;reaper&lt;/code&gt; function. Each process type — backend, startup, checkpointer, etc. — has its own cleanup flow.&lt;/p&gt;
&lt;p&gt;Here we only look at checkpointer cleanup. Also, &lt;code&gt;reaper&lt;/code&gt; has no cleanup logic for walsender:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; CheckpointerPID)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			CheckpointerPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Checkpointer exited normally, and pmState is PM_SHUTDOWN: waiting for checkpoint completion
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;EXIT_STATUS_0&lt;/span&gt;(exitstatus) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_SHUTDOWN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * OK, we saw normal exit of the checkpointer after it&amp;#39;s been
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * told to shut down. We expect that it wrote a shutdown
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * checkpoint. (If for some reason it didn&amp;#39;t, recovery will
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * occur on next postmaster start.)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * At this point we should have no normal backend children
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * left (else we&amp;#39;d not be in PM_SHUTDOWN state) but we might
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * have dead_end children to wait for.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * If we have an archiver subprocess, tell it to do a last
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * archive cycle and quit. Likewise, if we have walsender
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * processes, tell them to send any remaining WAL and quit.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(Shutdown &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; NoShutdown);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;//Wake archiver for the last time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (PgArchPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(PgArchPID, SIGUSR2); &lt;span style="color:#75715e"&gt;//pgarch SIGUSR2=pgarch_waken_stop
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Wake walsender for the last time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;SignalChildren&lt;/span&gt;(SIGUSR2);&lt;span style="color:#75715e"&gt;//walsender SIGUSR2=WalSndLastCycleHandler
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Here PM_SHUTDOWN_2 is set
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//At this point Checkpointer has exited normally; we should wait for pgarch and walsender to finish their last task
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//This is PM_SHUTDOWN_2 state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_SHUTDOWN_2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//checkpointer abnormal exit is considered a crash
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;HandleChildCrash&lt;/span&gt;(pid, exitstatus,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 &lt;span style="color:#a6e22e"&gt;_&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;checkpointer process&amp;#34;&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;//At the end reaper still enters the state machine function
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;PostmasterStateMachine&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Checkpointer and Walsender Process Exit
 &lt;div id="checkpointer-and-walsender-process-exit" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#checkpointer-and-walsender-process-exit" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Checkpointer main loop handling requests and shutdown:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;CheckpointerMain&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Loop forever
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		do_checkpoint &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			flags &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;pg_time_t&lt;/span&gt;	now;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			elapsed_secs;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			cur_timeout;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Clear any already-pending wakeups */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ResetLatch&lt;/span&gt;(MyLatch);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Process any requests or signals received recently.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Process recent sync requests and signals
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;AbsorbSyncRequests&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;HandleCheckpointerInterrupts&lt;/span&gt;();&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Checkpointer shutdown function:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Process any new interrupts.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;HandleCheckpointerInterrupts&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (ShutdownRequestPending)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * From here on, elog(ERROR) should end with exit(1), not send control
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * back to the sigsetjmp block above
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		ExitOnAnyError &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ShutdownXLOG&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);&lt;span style="color:#75715e"&gt;//This writes the shutdown checkpoint
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);&lt;span style="color:#75715e"&gt;//Normal exit code 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Checkpointer exit needs to wait for &lt;code&gt;ShutdownXLOG&lt;/code&gt; to complete.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ShutdownXLOG&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * This must be called ONCE during postmaster or standalone-backend shutdown
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ShutdownXLOG&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; code, Datum arg)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;//Here&amp;#39;s the checkpointer &amp;#34;shutting down&amp;#34; log, usually always seen
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(IsPostmasterEnvironment &lt;span style="color:#f92672"&gt;?&lt;/span&gt; LOG : NOTICE,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;shutting down&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Signal walsenders to move to stopping state.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Initialize walsender stopping
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;WalSndInitStopping&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Wait for all walsenders to be in stopping state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;WalSndWaitStopping&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;RecoveryInProgress&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;CreateRestartPoint&lt;/span&gt;(CHECKPOINT_IS_SHUTDOWN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; CHECKPOINT_IMMEDIATE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * If archiving is enabled, rotate the last XLOG file so that all the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * remaining records are archived (postmaster wakes up the archiver
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * process one more time at the end of shutdown). The checkpoint
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * record will go to the next XLOG file and won&amp;#39;t be archived (yet).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;XLogArchivingActive&lt;/span&gt;() &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;XLogArchiveCommandSet&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;RequestXLogSwitch&lt;/span&gt;(false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;//This is the shutdown checkpoint creation function
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;CreateCheckPoint&lt;/span&gt;(CHECKPOINT_IS_SHUTDOWN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; CHECKPOINT_IMMEDIATE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ShutdownCLOG&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ShutdownCommitTs&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ShutdownSUBTRANS&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ShutdownMultiXact&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Checkpointer notifies all walsenders to begin stopping:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Signal all walsenders to move to stopping state.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * This will trigger walsenders to move to a state where no further WAL can be
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * generated. See this file&amp;#39;s header for details.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;WalSndInitStopping&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (i &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; i &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; max_wal_senders; i&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		WalSnd	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;walsnd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;WalSndCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;walsnds[i];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;pid_t&lt;/span&gt;		pid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SpinLockAcquire&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		pid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;pid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SpinLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SendProcSignal&lt;/span&gt;(pid, PROCSIG_WALSND_INIT_STOPPING, InvalidBackendId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Walsender receives the signal via the &lt;code&gt;SendProcSignal&lt;/code&gt; function, with signal &lt;code&gt;SIGUSR1&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * SendProcSignal
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		Send a signal to a Postgres process
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Providing backendId is optional, but it will speed up the operation.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * On success (a signal was sent), zero is returned.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * On error, -1 is returned, and errno is set (typically to ESRCH or EPERM).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Not to be confused with ProcSendSignal
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SendProcSignal&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;pid_t&lt;/span&gt; pid, ProcSignalReason reason, BackendId backendId)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * BackendId not provided, so search the array using pid. We search
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * the array back to front so as to reduce search overhead. Passing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * InvalidBackendId means that the target is most likely an auxiliary
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * process, which will have a slot near the end of the array.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (i &lt;span style="color:#f92672"&gt;=&lt;/span&gt; NumProcSignalSlots &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;; i &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; i&lt;span style="color:#f92672"&gt;--&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			slot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;ProcSignal&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;psh_slot[i];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;pss_pid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; pid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* the above note about race conditions applies here too */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* Atomically set the proper flag */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;pss_signalFlags[reason] &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* Send signal */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;kill&lt;/span&gt;(pid, SIGUSR1);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	errno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; ESRCH;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Walsender&amp;rsquo;s &lt;code&gt;SIGUSR1&lt;/code&gt; registration:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGUSR1, procsignal_sigusr1_handler);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGUSR2, WalSndLastCycleHandler);	&lt;span style="color:#75715e"&gt;/* request a last cycle and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;												 * shutdown */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;sigusr1 classifies handling by signal reason:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * procsignal_sigusr1_handler - handle SIGUSR1 signal.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;procsignal_sigusr1_handler&lt;/span&gt;(SIGNAL_ARGS)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;CheckProcSignal&lt;/span&gt;(PROCSIG_WALSND_INIT_STOPPING))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;HandleWalSndInitStopping&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The handler for &lt;code&gt;PROCSIG_WALSND_INIT_STOPPING&lt;/code&gt; is &lt;code&gt;HandleWalSndInitStopping&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Handle PROCSIG_WALSND_INIT_STOPPING signal.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;HandleWalSndInitStopping&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(am_walsender);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If replication has not yet started, die like with SIGTERM. If
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * replication is active, only set a flag and wake up the main loop. It
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * will send any outstanding WAL, wait for it to be replicated to the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * standby, and then exit gracefully.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;replication_active)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;kill&lt;/span&gt;(MyProcPid, SIGTERM);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		got_STOPPING &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;&lt;span style="color:#75715e"&gt;//If walsender is active, initstopping just sets a flag for the main loop to handle
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &amp;ldquo;main loop&amp;rdquo; mentioned in the comment is somewhat ambiguous. Walsender has a main loop &lt;code&gt;ServerLoop&lt;/code&gt;, but in reality only the loop in &lt;code&gt;WalSndWaitForWal&lt;/code&gt; has checks for &lt;code&gt;got_STOPPING&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;WalSndWaitForWal&lt;/code&gt; function is the main loop for walsender waiting for new WAL records. Since WAL records are initially generated in memory, walwriter flushes them based on certain conditions, not all the time. &lt;code&gt;WalSndWaitForWal&lt;/code&gt; compares the currently sent LSN with the flushed LSN to determine whether new WAL needs to be sent. In other words, unflushed WAL is not transmitted; only flushed WAL is passed downstream.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;WalSndWaitForWal&lt;/code&gt; code segment about stopping:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Wait till WAL &amp;lt; loc is flushed to disk so it can be safely sent to client.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Returns end LSN of flushed WAL. Normally this will be &amp;gt;= loc, but
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * if we detect a shutdown request (either from postmaster or client)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * we will return early, so caller must always check.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; XLogRecPtr
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;WalSndWaitForWal&lt;/span&gt;(XLogRecPtr loc)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//After receiving got_STOPPING, do one flush of WAL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//This is necessary! Because walwriter may have already shut down at this point, WAL may not be flushed yet
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (got_STOPPING)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;XLogBackgroundFlush&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Update our idea of the currently flushed position. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;RecoveryInProgress&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			RecentFlushPtr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetFlushRecPtr&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			RecentFlushPtr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetXLogReplayRecPtr&lt;/span&gt;(NULL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Break out of the for loop
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//After getting new RecentFlushPtr, still need to send
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (got_STOPPING)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* reactivate latch so WalSndLoop knows to continue */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;SetLatch&lt;/span&gt;(MyLatch);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; RecentFlushPtr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Back to walsender main loop: &lt;code&gt;WalSndLoop(XLogSendLogical)&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Main loop of walsender process that streams the WAL over Copy messages. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;WalSndLoop&lt;/span&gt;(WalSndSendDataCallback send_data)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Clear any already-pending wakeups */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ResetLatch&lt;/span&gt;(MyLatch);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;//Process replies from downstream
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ProcessRepliesIfAny&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * If we have received CopyDone from the client, sent CopyDone
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * ourselves, and the output buffer is empty, it&amp;#39;s time to exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * streaming.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Exit loop when streaming is done
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (streamingDoneReceiving &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; streamingDoneSending &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;pq_is_send_pending&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//If output buffer has pending data, send it
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;pq_is_send_pending&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;send_data&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			WalSndCaughtUp &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Try to flush pending output to the client */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;pq_flush_if_writable&lt;/span&gt;() &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;WalSndShutdown&lt;/span&gt;();&lt;span style="color:#75715e"&gt;//Downstream not writable, downstream closed, normal walsender shutdown, exit code 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* If nothing remains to be sent right now ... */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (WalSndCaughtUp &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;pq_is_send_pending&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * If we&amp;#39;re in catchup state, move to streaming. This is an
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * important state change for users to know about, since before
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * this point data loss might occur if the primary dies and we
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * need to failover to the standby. The state change is also
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * important for synchronous replication, since commits that
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * started to wait at that point might wait for some time.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Data transmission is done, but commit info still needs to be sent
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (MyWalSnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;state &lt;span style="color:#f92672"&gt;==&lt;/span&gt; WALSNDSTATE_CATCHUP)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(DEBUG1,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt; has now caught up with upstream server&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								application_name)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;WalSndSetState&lt;/span&gt;(WALSNDSTATE_STREAMING);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Received SIGUSR2, meaning shutdown checkpoint is done.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Send the shutdown checkpoint record, wait for completion, then exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (got_SIGUSR2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;WalSndDone&lt;/span&gt;(send_data);&lt;span style="color:#75715e"&gt;//exit code 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s return to checkpointer&amp;rsquo;s &lt;code&gt;ShutdownXLOG&lt;/code&gt; logic. The above only analyzed &lt;code&gt;WalSndInitStopping()&lt;/code&gt;. After this signal is sent to walsender, &lt;code&gt;WalSndWaitStopping&lt;/code&gt; executes to wait for walsender.&lt;/p&gt;
&lt;p&gt;As long as any walsender hasn&amp;rsquo;t exited, this is an infinite loop that won&amp;rsquo;t return:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Wait that all the WAL senders have quit or reached the stopping state. This
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * is used by the checkpointer to control when the shutdown checkpoint can
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * safely be performed.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;WalSndWaitStopping&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		all_stopped &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (i &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; i &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; max_wal_senders; i&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			WalSnd	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;walsnd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;WalSndCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;walsnds[i];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;SpinLockAcquire&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;pid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;SpinLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;state &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; WALSNDSTATE_STOPPING)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				all_stopped &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;SpinLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;SpinLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* safe to leave if confirmation is done for all WAL senders */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (all_stopped)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;pg_usleep&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;10000L&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* wait for 10 msec */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Finally, combined with the comments in walsender.c:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; If the server is shut down, checkpointer sends us
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; PROCSIG_WALSND_INIT_STOPPING after all regular backends have exited. If
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; the backend is idle or runs an SQL query this causes the backend to
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; shutdown, &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; logical replication is in progress all existing WAL records
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; are processed followed by a shutdown. Otherwise this causes the walsender
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; to &lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; to the &lt;span style="color:#e6db74"&gt;&amp;#34;stopping&amp;#34;&lt;/span&gt; state. In this state, the walsender will reject
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; any further replication commands. The checkpointer begins the shutdown
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; checkpoint once all walsenders are confirmed as stopping. When the shutdown
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; checkpoint finishes, the postmaster sends us SIGUSR2. This instructs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; walsender to send any outstanding WAL, including the shutdown checkpoint
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; record, wait &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; it to be replicated to the standby, and then exit.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;After all regular backends have exited, checkpointer sends &lt;code&gt;PROCSIG_WALSND_INIT_STOPPING&lt;/code&gt; to walsenders&lt;/li&gt;
&lt;li&gt;Walsender may enter the stopping state&lt;/li&gt;
&lt;li&gt;Only after all walsenders enter stopping state does checkpointer perform the shutdown checkpoint&lt;/li&gt;
&lt;li&gt;After the shutdown checkpoint completes, pm sends &lt;code&gt;SIGUSR2&lt;/code&gt; to walsender, which sends any remaining WAL including the shutdown checkpoint record itself, waits for standby to complete, then exits&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Shutdown Flow Diagram
 &lt;div id="shutdown-flow-diagram" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shutdown-flow-diagram" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;After going through the source code, it felt like I understood but also didn&amp;rsquo;t — needed a shutdown flowchart to clarify.&lt;/p&gt;
&lt;p&gt;Summary of the fast shutdown flow:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/464a8c3e13dd.png" alt="pg fast停库流程.png" /&gt;&lt;/p&gt;
&lt;p&gt;(High resolution: &lt;a href="https://www.processon.com/view/link/6778a73a04a8344b9502637a" target="_blank" rel="noreferrer"&gt;https://www.processon.com/view/link/6778a73a04a8344b9502637a&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PG manages shutdown logic through signals, per-process main loops, PM state machine, and the pmdie process reaping function&lt;/li&gt;
&lt;li&gt;Also note: signals themselves are asynchronous. If you need to wait for the result of signal processing in a target process, you typically need other synchronization mechanisms (pipes, semaphores, shared memory, etc.). PG mainly relies on process dependencies and whether processes exit normally to determine if signals were properly handled.&lt;/li&gt;
&lt;li&gt;pgarch and walsender are treated as the same type of process, handled differently from others (walwriter, bgwriter). pgarch and walsender need to do an additional &amp;ldquo;&lt;strong&gt;last task&lt;/strong&gt;&amp;rdquo;. The signal for the &amp;ldquo;&lt;strong&gt;last task&lt;/strong&gt;&amp;rdquo; is typically defined as SIGUSR2.&lt;/li&gt;
&lt;li&gt;Checkpointer&amp;rsquo;s normal exit depends on pgarch and walsender exiting normally.&lt;/li&gt;
&lt;li&gt;pgarch&amp;rsquo;s last task is the final archive. So archiving can affect shutdown.&lt;/li&gt;
&lt;li&gt;Walsender&amp;rsquo;s second-to-last task is delivering the final WAL, and its last task is delivering the checkpoint shutdown info. These tasks require downstream reply messages, so walsender can affect shutdown.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Test Reproduction
 &lt;div id="test-reproduction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-reproduction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Test: Reproducing Walsender Blocking Shutdown
 &lt;div id="test-reproducing-walsender-blocking-shutdown" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-reproducing-walsender-blocking-shutdown" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;After fast stop shutdown, walsender can block the shutdown.&lt;/p&gt;
&lt;p&gt;Tested various scenarios to reproduce walsender blocking shutdown. Currently, the following conditions together make it easier to trigger abnormal shutdown:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One walsender for publication/subscription&lt;/li&gt;
&lt;li&gt;One walsender for DTS&lt;/li&gt;
&lt;li&gt;Large number of subtransactions causing replication slot spill&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This three-in-one scenario doesn&amp;rsquo;t represent the only scenario; it&amp;rsquo;s just one that was easier to reproduce after testing many.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Reproduction commands (not extremely stable reproduction)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#66d9ef"&gt;Create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--pg
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpg(id bigserial &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,a char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),b char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--oracle
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl.lzloracle(id number &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; ,a char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),b char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;)) tablespace FADATA;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#66d9ef"&gt;Set&lt;/span&gt; up &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; logical replication links (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; pub&lt;span style="color:#f92672"&gt;/&lt;/span&gt;sub, &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; DTS &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; oracle)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;.Reduce logical_decoding_work_mem
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;logical_decoding_work_mem&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;.&lt;span style="color:#66d9ef"&gt;Write&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;large&lt;/span&gt; amounts &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; (recommended: subtransaction spill)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Insert one row at a time, each insert as a subtransaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo &lt;span style="color:#e6db74"&gt;&amp;#34;begin;&amp;#34;&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;subtx.&lt;span style="color:#66d9ef"&gt;sql&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; i &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;500000&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;do&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; echo &lt;span style="color:#e6db74"&gt;&amp;#34;savepoint p$i;&amp;#34;&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&lt;/span&gt;subtx.&lt;span style="color:#66d9ef"&gt;sql&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; echo &lt;span style="color:#e6db74"&gt;&amp;#34;insert into lzlpg(column1,column2,column3) select &amp;#39;a&amp;#39;,&amp;#39;b&amp;#39;,&amp;#39;c&amp;#39;;&amp;#34;&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&lt;/span&gt;subtx.&lt;span style="color:#66d9ef"&gt;sql&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;done
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nohup psql &lt;span style="color:#f92672"&gt;-&lt;/span&gt;d lzl &lt;span style="color:#f92672"&gt;-&lt;/span&gt;f subtx.&lt;span style="color:#66d9ef"&gt;sql&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.Stop the &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;before&lt;/span&gt; writing completes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl stop &lt;span style="color:#f92672"&gt;-&lt;/span&gt;D &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;PGDATA &lt;span style="color:#f92672"&gt;-&lt;/span&gt;m fast&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;At this point, with fast shutdown, the database is in an incomplete shutdown state:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;~/lzl/slot&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ps -axjf|grep &lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;150696&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;64964&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;64961&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;146782&lt;/span&gt; pts/42 &lt;span style="color:#ae81ff"&gt;64961&lt;/span&gt; S+ &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; grep --color&lt;span style="color:#f92672"&gt;=&lt;/span&gt;auto &lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt; ? -1 Ss &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 /myhost/postgres/base/rasesql1.5.6/bin/postgres -D /myhost/pg8094/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;110599&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;110599&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;110599&lt;/span&gt; ? -1 Ss &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; postgres: lzlpg: logger 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;117803&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;117803&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;117803&lt;/span&gt; ? -1 Ss &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; postgres: lzlpg: checkpointer 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;117807&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;117807&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;117807&lt;/span&gt; ? -1 Ss &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; postgres: lzlpg: stats collector 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;118563&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;118563&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;118563&lt;/span&gt; ? -1 Rs &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 3:29 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; postgres: lzlpg: walsender lzl 127.0.0.1&lt;span style="color:#f92672"&gt;(&lt;/span&gt;62971&lt;span style="color:#f92672"&gt;)&lt;/span&gt; idle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;222918&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;222918&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;222918&lt;/span&gt; ? -1 Rs &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 2:59 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; postgres: lzlpg: walsender dtssync 30.181.46.203&lt;span style="color:#f92672"&gt;(&lt;/span&gt;57218&lt;span style="color:#f92672"&gt;)&lt;/span&gt; idle&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Walsender, checkpointer, postmaster are all still there; logger and stats haven&amp;rsquo;t exited either.&lt;/p&gt;
&lt;p&gt;The control file state is &lt;code&gt;in production&lt;/code&gt;: meaning running in production, indicating the local shutdown checkpoint by checkpointer didn&amp;rsquo;t complete:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;~/lzl/slot&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ pg_controldata|grep -i state
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Database cluster state: in production&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Checkpointer stack:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pstack &lt;span style="color:#ae81ff"&gt;117803&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 0x00002b879fe0b983 in __select_nocancel () from /lib64/libc.so.6
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x00000000008fd04a in pg_usleep (microsec=microsec@entry=10000) at pgsleep.c:56
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x00000000007610c8 in WalSndWaitStopping () at walsender.c:3209
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x000000000051fa86 in ShutdownXLOG (code=code@entry=0, arg=arg@entry=0) at xlog.c:8596
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x00000000007215ff in HandleCheckpointerInterrupts () at checkpointer.c:566
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#5 CheckpointerMain () at checkpointer.c:343
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;At this point, checkpointer is stuck in &lt;code&gt;WalSndWaitStopping&lt;/code&gt;, meaning checkpointer is waiting for walsender processes to enter stopping state.&lt;/p&gt;
&lt;p&gt;Walsender stack at this point:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 0x00000000007484fb in ReorderBufferLargestTXN (rb=&amp;lt;optimized out&amp;gt;) at reorderbuffer.c:2345
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 ReorderBufferCheckMemoryLimit (rb=0x2b8808b94118) at reorderbuffer.c:2390
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 ReorderBufferQueueChange (rb=0x2b8808b94118, xid=&amp;lt;optimized out&amp;gt;, lsn=1676456602544, change=change@entry=0x2b87a229f408) at reorderbuffer.c:649
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x000000000073ec99 in DecodeTruncate (buf=&amp;lt;optimized out&amp;gt;, buf=&amp;lt;optimized out&amp;gt;, ctx=&amp;lt;optimized out&amp;gt;) at decode.c:872
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 DecodeHeapOp (buf=0x7ffda7d35180, ctx=0x2b87a224b118) at decode.c:455
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#5 LogicalDecodingProcessRecord (ctx=0x2b87a224b118, record=&amp;lt;optimized out&amp;gt;) at decode.c:126
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#6 0x000000000075f502 in XLogSendLogical () at walsender.c:2886
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#7 0x0000000000761822 in WalSndLoop (send_data=send_data@entry=0x75f4c0 &amp;lt;XLogSendLogical&amp;gt;) at walsender.c:2287
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Walsender is stuck in the transaction spill function. (&lt;em&gt;Why it&amp;rsquo;s stuck is still unclear!!!&lt;/em&gt;)&lt;/p&gt;
&lt;p&gt;Checkpointer process is blocked in &lt;code&gt;WalSndWaitStopping&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Wait that all the WAL senders have quit or reached the stopping state. This
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * is used by the checkpointer to control when the shutdown checkpoint can
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * safely be performed.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;WalSndWaitStopping&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		all_stopped &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (i &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; i &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; max_wal_senders; i&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			WalSnd	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;walsnd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;WalSndCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;walsnds[i];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;SpinLockAcquire&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;pid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;SpinLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;state &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; WALSNDSTATE_STOPPING)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				all_stopped &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;SpinLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;SpinLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* safe to leave if confirmation is done for all WAL senders */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (all_stopped)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;pg_usleep&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;10000L&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* wait for 10 msec */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From the code and stack, it&amp;rsquo;s clear the condition &lt;code&gt;walsnd-&amp;gt;state != WALSNDSTATE_STOPPING&lt;/code&gt; is hit, causing the infinite loop.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Test: Handling the Mid-Shutdown State
 &lt;div id="test-handling-the-mid-shutdown-state" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-handling-the-mid-shutdown-state" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The above is an awkward mid-shutdown state. Besides &lt;code&gt;kill -9&lt;/code&gt;, there are other better ways to achieve consistent shutdown:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Solution 1: Shut down the downstream process&lt;/li&gt;
&lt;li&gt;Solution 2: Send &lt;code&gt;SIGTERM&lt;/code&gt; to walsender&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Solution 1 test:&lt;/p&gt;
&lt;p&gt;When the downstream exits, walsender will also exit:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ProcessRepliesIfAny&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * &amp;#39;X&amp;#39; means that the standby is closing down the socket.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;X&amp;#39;&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For pub/sub, execute the following on the subscriber side; even if the upstream is in mid-shutdown state, this will cause walsender to exit:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; SUBSCRIPTION sub_lzl disable;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;However, this depends on the downstream&amp;rsquo;s own handling; we can&amp;rsquo;t always quickly shut down the downstream receiver process of DTS and other sync tools.&lt;/p&gt;
&lt;p&gt;Solution 2 test:&lt;/p&gt;
&lt;p&gt;Since walsender registers the &lt;code&gt;SIGTERM&lt;/code&gt; signal, and the &lt;code&gt;select pg_terminate_backend($walsender_pid)&lt;/code&gt; command run while the database is running also sends &lt;code&gt;SIGTERM&lt;/code&gt; to walsender, theoretically just sending &lt;code&gt;SIGTERM&lt;/code&gt; to walsender should handle this, without needing &lt;code&gt;kill -9&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Command:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kill &lt;span style="color:#f92672"&gt;-&lt;/span&gt;SIGTERM &lt;span style="color:#ae81ff"&gt;62834&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;same &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; kill &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;62834&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;same &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; kill &lt;span style="color:#ae81ff"&gt;62834&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After normal kill, pm and all other processes exit completely.&lt;/p&gt;
&lt;p&gt;Check the control file and WAL log to confirm consistent shutdown:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;pg_controldata database state changed from &lt;code&gt;in production&lt;/code&gt; to &lt;code&gt;shut down&lt;/code&gt; — consistent shutdown:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pg_controldata|grep -i state
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Database cluster state: shut down&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;The last record in the WAL log is &lt;code&gt;CHECKPOINT_SHUTDOWN&lt;/code&gt;:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_waldump 000000010000018600000012|tail -1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_waldump: fatal: error in WAL record at 186/915D7920: invalid record length at 186/915D7998: wanted 24, got &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: XLOG len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 114/ 114, tx: 0, lsn: 186/915D7920, prev 186/915D78A8, desc: CHECKPOINT_SHUTDOWN redo 186/915D7920; tli 1; prev tli 1; fpw true; xid 0:13431045; oid 3808147; multi 3; offset 6; oldest xid &lt;span style="color:#ae81ff"&gt;485&lt;/span&gt; in DB 1; oldest multi &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; in DB 1; oldest/newest commit timestamp xid: 494/13431044; oldest running xid 0; shutdown&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Test: Reproducing Only Primary Having CHECKPOINT_SHUTDOWN
 &lt;div id="test-reproducing-only-primary-having-checkpoint_shutdown" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-reproducing-only-primary-having-checkpoint_shutdown" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A phenomenon in the production environment was that the local WAL had a shutdown checkpoint but the standby didn&amp;rsquo;t. In production, an immediate stop was performed during mid-shutdown, and then startup failed.&lt;/p&gt;
&lt;p&gt;At the time, the last 2 WAL records on primary and standby looked something like:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Primary WAL:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CHECKPOINT_ONLINE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CHECKPOINT_SHUTDOWN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Standby WAL:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CHECKPOINT_ONLINE&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Reproduction commands:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## 1. First reproduce walsender blocking shutdown&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;skipped&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## 2. Check the last WAL record&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Standby len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 50/ 50, tx: 0, lsn: 188/307ABE00, prev 188/307ABDC8, desc: RUNNING_XACTS nextXid &lt;span style="color:#ae81ff"&gt;13432445&lt;/span&gt; latestCompletedXid &lt;span style="color:#ae81ff"&gt;13432444&lt;/span&gt; oldestRunningXid &lt;span style="color:#ae81ff"&gt;13432445&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## 3. pg_ctl stop -D $PGDATA -m i&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## 4. Check last WAL record&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Unchanged, same as &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## 5. pg_ctl start -D $PGDATA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## 6. Check last two WAL records&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Standby len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 50/ 50, tx: 0, lsn: 188/307ABE00, prev 188/307ABDC8, desc: RUNNING_XACTS nextXid &lt;span style="color:#ae81ff"&gt;13432445&lt;/span&gt; latestCompletedXid &lt;span style="color:#ae81ff"&gt;13432444&lt;/span&gt; oldestRunningXid &lt;span style="color:#ae81ff"&gt;13432445&lt;/span&gt; &lt;span style="color:#75715e"&gt;#same as 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: XLOG len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 114/ 114, tx: 0, lsn: 188/307ABE38, prev 188/307ABE00, desc: CHECKPOINT_SHUTDOWN redo 188/307ABE38; tli 1; prev tli 1; fpw true; xid 0:13432445; oid 3832732; multi 3; offset 6; oldest xid &lt;span style="color:#ae81ff"&gt;485&lt;/span&gt; in DB 1; oldest multi &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; in DB 1; oldest/newest commit timestamp xid: 494/13432444; oldest running xid 0; shutdown &lt;span style="color:#75715e"&gt;#CHECKPOINT_SHUTDOWN appears&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From this reproduction, &lt;code&gt;CHECKPOINT_SHUTDOWN&lt;/code&gt; is actually done during &lt;strong&gt;startup&lt;/strong&gt;!&lt;/p&gt;
&lt;p&gt;This matches the production sequence: 1. fast shutdown didn&amp;rsquo;t complete 2. immediate shutdown 3. startup failed.&lt;/p&gt;
&lt;p&gt;Question 1: When during startup is CHECKPOINT_SHUTDOWN done?&lt;/p&gt;
&lt;p&gt;Question 2: When is CHECKPOINT_ONLINE triggered? From reproduction appearances, occasionally fast shutdown results in the last WAL record being CHECKPOINT_ONLINE.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Question 1 analysis:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Doing a shutdown checkpoint at startup easily suggests the startup process. Since we&amp;rsquo;ve previously analyzed the startup process flow, we can directly locate the function &lt;code&gt;StartupXLOG&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * This must be called ONCE during postmaster or standalone-backend startup
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;StartupXLOG&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (InRecovery) &lt;span style="color:#75715e"&gt;//Since it was a shutdown stop, instance recovery is needed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Perform a checkpoint to update all our recovery activity to disk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Note that we write a shutdown checkpoint rather than an on-line
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * one. This is not particularly critical, but since we may be
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * assigning a new TLI, using a shutdown checkpoint allows us to have
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * the rule that TLI only changes in shutdown checkpoints, which
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * allows some extra error checking in xlog_redo.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * In fast promotion, only create a lightweight end-of-recovery record
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * instead of a full checkpoint. A checkpoint is requested later,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * after we&amp;#39;re fully out of recovery mode and already accepting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * queries.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (bgwriterLaunched) &lt;span style="color:#75715e"&gt;//This if is clearly for standby streaming replication
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#75715e"&gt;//Primary startup goes here
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;CreateCheckPoint&lt;/span&gt;(CHECKPOINT_END_OF_RECOVERY &lt;span style="color:#f92672"&gt;|&lt;/span&gt; CHECKPOINT_IMMEDIATE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Doing a shutdown checkpoint is intentional, mainly for TLI logic code robustness&lt;/li&gt;
&lt;li&gt;Whenever it&amp;rsquo;s not a consistent shutdown, a shutdown checkpoint is performed during startup&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, doing &lt;code&gt;-m i&lt;/code&gt; forced shutdown and then starting up will also produce &lt;code&gt;CHECKPOINT_SHUTDOWN&lt;/code&gt; — self-tested.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Question 2 analysis:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Tested multiple times, occasionally seen. Speculation: it just happened that before shutdown, checkpoint conditions were met and an online checkpoint was triggered — pure coincidence.&lt;/p&gt;
&lt;p&gt;Considering that after a failed database shutdown, whether it&amp;rsquo;s a script, HA, or manual intervention, forced shutdown may be done, it&amp;rsquo;s recommended to do at least one checkpoint before shutdown.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Test: Impact of Archiving on Shutdown
 &lt;div id="test-impact-of-archiving-on-shutdown" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-impact-of-archiving-on-shutdown" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;While analyzing the shutdown code, I also found that after the checkpointer process exits, reaper for checkpointer sends &lt;code&gt;SIGUSR2&lt;/code&gt; to pgarch for its last archive and exit:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;reaper&lt;/span&gt;(SIGNAL_ARGS)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; CheckpointerPID)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			CheckpointerPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;EXIT_STATUS_0&lt;/span&gt;(exitstatus) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_SHUTDOWN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* Waken archiver for the last time */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (PgArchPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(PgArchPID, SIGUSR2);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;And pm&amp;rsquo;s exit depends on all processes except syslogger having exited:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_WAIT_DEAD_END)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;dlist_is_empty&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;BackendList) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			PgArchPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; PgStatPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* These other guys should be dead already */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(StartupPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(WalReceiverPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(BgWriterPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(CheckpointerPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(WalWriterPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(AutoVacPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* syslogger is not considered here */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_NO_CHILDREN;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So in production, slow archiving was also found to affect shutdown.&lt;/p&gt;
&lt;p&gt;Reproduction commands:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;Configure archiving
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;archive_mode &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;archive_command &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;/bin/false ;sleep 1000&amp;#39;&lt;/span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;Set&lt;/span&gt; archiving &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; always fail &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; sleep &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; bypass NUM_ARCHIVE_RETRIES logic
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;Shutdown
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl stop &lt;span style="color:#f92672"&gt;-&lt;/span&gt;D &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;PGDATA &lt;span style="color:#f92672"&gt;-&lt;/span&gt;m fast&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Processes after shutdown:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ ps -axjf|grep &lt;span style="color:#ae81ff"&gt;61470&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;72200&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;88406&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;88405&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;68705&lt;/span&gt; pts/48 &lt;span style="color:#ae81ff"&gt;88405&lt;/span&gt; S+ &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; grep --color&lt;span style="color:#f92672"&gt;=&lt;/span&gt;auto &lt;span style="color:#ae81ff"&gt;61470&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61470&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61470&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61470&lt;/span&gt; ? -1 Ss &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 /myhost/postgres/base/rasesql1.5.6/bin/postgres -D /myhost/pg8094/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;61470&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61772&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61772&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61772&lt;/span&gt; ? -1 Ss &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; postgres: lzlpg: logger 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;61470&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;63880&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;63880&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;63880&lt;/span&gt; ? -1 Ss &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; postgres: lzlpg: archiver archiving &lt;span style="color:#ae81ff"&gt;000000010000018800000007&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since the checkpointer here has already fully stopped, the database is in a consistent state, so using &lt;code&gt;kill -9&lt;/code&gt; on archiver is fine.&lt;/p&gt;

&lt;h2 class="relative group"&gt;One-Sentence Summary
 &lt;div id="one-sentence-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#one-sentence-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Q1: Why didn&amp;rsquo;t shutdown complete?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Walsender blocked shutdown. Checkpointer sent SIGUSR1 to walsender and infinitely waited for all walsender processes to enter stopping state; checkpointer got stuck at this step.&lt;/p&gt;
&lt;p&gt;The shutdown eventually completed due to &lt;code&gt;-m i&lt;/code&gt; forced shutdown.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q2: Is there a graceful way to shut down from the mid-shutdown state caused by walsender blocking?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Yes. Send &lt;code&gt;SIGTERM&lt;/code&gt; (i.e. &lt;code&gt;kill&lt;/code&gt;, or &lt;code&gt;kill -15&lt;/code&gt;, &lt;code&gt;kill -SIGTERM&lt;/code&gt;) to all walsenders. Afterwards, checkpointer and postmaster will complete a clean shutdown.&lt;/p&gt;
&lt;p&gt;Walsender registers the &lt;code&gt;SIGTERM&lt;/code&gt; signal at startup, and testing shows no scenario where it can&amp;rsquo;t be handled.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;SIGTERM&lt;/code&gt; is also the signal sent by &lt;code&gt;pg_terminate_backend(pid)&lt;/code&gt;, and it&amp;rsquo;s the command that should be executed to stop walsender during a standard shutdown.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q3: Why did primary and standby differ by exactly one &lt;code&gt;shutdown checkpoint&lt;/code&gt;?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;3.1 Explanation for both primary and standby having &lt;code&gt;CHECKPOINT_ONLINE&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The primary triggering &lt;code&gt;CHECKPOINT_ONLINE&lt;/code&gt; was purely coincidental&lt;/li&gt;
&lt;li&gt;Since the physical walsender was still there, this WAL record was transmitted to the standby&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;3.2 Explanation for only primary having &lt;code&gt;CHECKPOINT_SHUTDOWN&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This &lt;code&gt;CHECKPOINT_SHUTDOWN&lt;/code&gt; was done during primary startup&lt;/li&gt;
&lt;li&gt;Since the primary hadn&amp;rsquo;t fully started, this WAL record wasn&amp;rsquo;t transmitted to the standby&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Q4: Why does archiver block shutdown?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When reaping the checkpointer process, pm tells archiver to do one last archive, and pm depends on all processes except syslogger having exited. So if the last archive is slow or has issues, it blocks shutdown. Archive failure won&amp;rsquo;t — the archiver process exits quickly on failure.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q5: Which processes can block shutdown?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Actually, any process not exiting can block shutdown. The question is which ones are more likely to cause trouble. From the shutdown code flow, archiver and walsender commonly block shutdown because they perform a last archive or log transmission during the shutdown phase.&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/server-shutdown.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/server-shutdown.html&lt;/a&gt;
&lt;a href="https://wiki.postgresql.org/wiki/Signals" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Signals&lt;/a&gt;
postgres.c
postmaster.c
walsender.c
xlog.c
checkpointer.c
startup.c
pgarch.c&lt;/p&gt;</content:encoded></item><item><title>PG Startup Logic and Spill-Caused Slow Startup Analysis</title><link>https://lastdba.com/en/2025/01/04/pg-startup-logic-and-spill-caused-slow-startup-analysis/</link><pubDate>Sat, 04 Jan 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/01/04/pg-startup-logic-and-spill-caused-slow-startup-analysis/</guid><description>&lt;h2 class="relative group"&gt;Problem Symptom — Slow Startup
 &lt;div id="problem-symptom--slow-startup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptom--slow-startup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Version: PG 13.2&lt;/p&gt;
&lt;p&gt;Database startup was slow. The startup process was reading spill files, and the filenames kept changing. Checking the spill files was also very slow — &lt;code&gt;ls -l&lt;/code&gt; eventually showed 8 million spill files.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Why Tens of Millions of Spill Files?
 &lt;div id="why-tens-of-millions-of-spill-files" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-tens-of-millions-of-spill-files" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;WAL Segment and LSN Meaning
 &lt;div id="wal-segment-and-lsn-meaning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#wal-segment-and-lsn-meaning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;LSN
 &lt;div id="lsn" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lsn" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;LSN is a 64-bit bigint. An LSN actually looks like &lt;code&gt;42D3B/1732C540&lt;/code&gt; (hex). Before the slash &lt;code&gt;/&lt;/code&gt; is the 32-bit logical log number, and after the &lt;code&gt;/&lt;/code&gt; are 32 bits split into segment number + block number + intra-block offset. These 4 parts are:&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Symptom — Slow Startup
 &lt;div id="problem-symptom--slow-startup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptom--slow-startup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Version: PG 13.2&lt;/p&gt;
&lt;p&gt;Database startup was slow. The startup process was reading spill files, and the filenames kept changing. Checking the spill files was also very slow — &lt;code&gt;ls -l&lt;/code&gt; eventually showed 8 million spill files.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Why Tens of Millions of Spill Files?
 &lt;div id="why-tens-of-millions-of-spill-files" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-tens-of-millions-of-spill-files" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;WAL Segment and LSN Meaning
 &lt;div id="wal-segment-and-lsn-meaning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#wal-segment-and-lsn-meaning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;LSN
 &lt;div id="lsn" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lsn" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;LSN is a 64-bit bigint. An LSN actually looks like &lt;code&gt;42D3B/1732C540&lt;/code&gt; (hex). Before the slash &lt;code&gt;/&lt;/code&gt; is the 32-bit logical log number, and after the &lt;code&gt;/&lt;/code&gt; are 32 bits split into segment number + block number + intra-block offset. These 4 parts are:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;32 bits&lt;/th&gt;
 &lt;th&gt;8 bits&lt;/th&gt;
 &lt;th&gt;11 bits&lt;/th&gt;
 &lt;th&gt;13 bits&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Logical log number&lt;/td&gt;
 &lt;td&gt;Log segment number&lt;/td&gt;
 &lt;td&gt;Block number&lt;/td&gt;
 &lt;td&gt;Intra-block offset&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Intra-block offset 8192 = 2^13&lt;/p&gt;
&lt;p&gt;Block number = 16M (default WAL segment size) / 8192&lt;/p&gt;

&lt;h4 class="relative group"&gt;WAL Segment
 &lt;div id="wal-segment" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#wal-segment" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;A WAL filename consists of 3 groups of hex digits.&lt;/p&gt;
&lt;p&gt;Taking the 8k WAL file &lt;code&gt;0000000300042D3B00000002&lt;/code&gt; as example:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;32 bits&lt;/th&gt;
 &lt;th&gt;32 bits&lt;/th&gt;
 &lt;th&gt;32 bits&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;timeline&lt;/td&gt;
 &lt;td&gt;Logical log number&lt;/td&gt;
 &lt;td&gt;Log segment number&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;00000003&lt;/td&gt;
 &lt;td&gt;00042D3B&lt;/td&gt;
 &lt;td&gt;00000002&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;It can be seen that an LSN can locate a WAL filename and the offset position within the file.&lt;/p&gt;
&lt;p&gt;Among these, the part before the LSN slash &lt;code&gt;/&lt;/code&gt; is the logical log number, and the 8-bit log segment number after the slash &lt;code&gt;/&lt;/code&gt; will be used below.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Spill Filename Conversion
 &lt;div id="spill-filename-conversion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#spill-filename-conversion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Replication slot name: logical_ex2209_rep&lt;/p&gt;
&lt;p&gt;Spill filename: xid-407989064-lsn-42D1E-20000000.spill&lt;/p&gt;
&lt;p&gt;42D1E is not a complete LSN and cannot be directly used with &lt;code&gt;pg_walfile_name&lt;/code&gt; to locate a WAL filename. 42D1E is a logical log number. If we directly filter WAL files containing 42D1E in the name, we find 16 WAL files.&lt;/p&gt;
&lt;p&gt;Can we locate the WAL log segment number from the number 20000000 to pinpoint the exact file?&lt;/p&gt;
&lt;p&gt;Spill filename generation:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Given a replication slot, transaction ID and segment number, fill in the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * corresponding spill file into &amp;#39;path&amp;#39;, which is a caller-owned buffer of size
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * at least MAXPGPATH.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferSerializedPath&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;path, ReplicationSlot &lt;span style="color:#f92672"&gt;*&lt;/span&gt;slot, TransactionId xid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; XLogSegNo segno)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; XLogRecPtr recptr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a6e22e"&gt;XLogSegNoOffsetToRecPtr&lt;/span&gt;(segno, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, wal_segment_size, recptr);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a6e22e"&gt;snprintf&lt;/span&gt;(path, MAXPGPATH, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot/%s/xid-%u-lsn-%X-%X.spill&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a6e22e"&gt;NameStr&lt;/span&gt;(MyReplicationSlot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.name),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (uint32) (recptr &lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;), (uint32) recptr);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;pg_replslot/%s&lt;/code&gt; and &lt;code&gt;xid-%u-lsn&lt;/code&gt; parts are easy to understand — just the replication slot name and xid. The &lt;code&gt;recptr&lt;/code&gt; needs a closer look at its definition:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Pointer to a location in the XLOG. These pointers are 64 bits wide,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * because we don&amp;#39;t want them ever to overflow.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; uint64 XLogRecPtr;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;XLogSegNoOffsetToRecPtr&lt;/code&gt; calculates the LSN from the WAL log segment number, segment size, and offset:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLogSegNoOffsetToRecPtr(segno, offset, wal_segsz_bytes, dest) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; (dest) = (segno) * (wal_segsz_bytes) + (offset)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;XLogRecPtr is the LSN! So:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;(uint32) (recptr &amp;gt;&amp;gt; 32)&lt;/code&gt; takes the first 32 bits of LSN, &lt;code&gt;(uint32) recptr)&lt;/code&gt; takes the last 32 bits.&lt;/p&gt;
&lt;p&gt;The first 32 bits of LSN is what we saw as the first half of LSN, lsn-42D1E. The last 32 bits of LSN actually contain more information; here we only need the first few bits of the last 32 bits — the segment number.&lt;/p&gt;
&lt;p&gt;Since the passed-in offset=0 and we also have segno, we don&amp;rsquo;t actually need the intra-block offset information to calculate the dest value. The real value of wal_segsz_bytes is 128M = 128*1024*1024. Converting the formula in &lt;code&gt;XLogSegNoOffsetToRecPtr&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;segno&lt;span style="color:#f92672"&gt;=&lt;/span&gt; dest&lt;span style="color:#f92672"&gt;/&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;128&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Convert hex 20000000
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;segno&lt;span style="color:#f92672"&gt;=&lt;/span&gt; x&lt;span style="color:#e6db74"&gt;&amp;#39;20000000&amp;#39;&lt;/span&gt;::int&lt;span style="color:#f92672"&gt;/&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;128&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;segno&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From this formula we can derive the log segment number segno, which lets us locate the WAL file number.&lt;/p&gt;
&lt;p&gt;So the spill filename xid-407989064-lsn-42D1E-20000000.spill corresponds to the WAL file:&lt;/p&gt;
&lt;p&gt;Logical log number=42D1E, segment number=04:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ls 42D1E*04
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000200042D1E00000004&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;pg_waldump shows xid 407989064 inside.&lt;/p&gt;
&lt;p&gt;In practice, the WAL size is also fixed after instance creation, i.e. (128*1024*1024) is a constant, so segno is absolutely correlated with (uint32) recptr, but not equal to it. This means that switching to a new WAL log file creates a new spill file.&lt;/p&gt;
&lt;p&gt;Summary of &lt;strong&gt;spill file generation rules&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Same transaction id: if it spans multiple WAL files, it produces multiple spills. E.g., a large transaction without subtransactions spanning 3 WAL files produces 3 spill files.&lt;/li&gt;
&lt;li&gt;Different transaction ids produce different spills. E.g., 10 million subtransactions produce 10 million spill files.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Spill filename structure xid-407989064-lsn-42D1E-20000000.spill:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;xid&lt;/th&gt;
 &lt;th&gt;First 32 bits of LSN; i.e., WAL logical log number&lt;/th&gt;
 &lt;th&gt;Converted from WAL log segment number; not equal to segment number&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;xid-407989064&lt;/td&gt;
 &lt;td&gt;lsn-42D1E&lt;/td&gt;
 &lt;td&gt;20000000&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Recovered environment&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll |head -100
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;40000276&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;184&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 15:20 state
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;196&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:25 xid-407989064-lsn-42D1E-0.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;208&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:25 xid-407989064-lsn-42D1E-20000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;540&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 16:44 xid-407989064-lsn-42D2A-D0000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989065-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989066-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989068-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989070-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989072-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989076-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989079-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989080-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989082-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzlhost /myhost/liuzhilong/pg_replslot/logical_ex9e15_rep&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{print $9}&amp;#39;&lt;/span&gt;|awk -F &lt;span style="color:#e6db74"&gt;&amp;#39;-&amp;#39;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;{print $2}&amp;#39;&lt;/span&gt;|sort|uniq -c|wc -l
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;10000003&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzlhost /myhost/liuzhilong/pg_replslot/logical_ex9e15_rep&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll |wc -l
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;10000070&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So in the actual environment we saw 10,000,070 files, with 10,000,003 distinct xids among them — meaning 1 parent transaction spanning about 70 WAL files, with this parent transaction having 10 million subtransactions.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Replication Slot Spill Testing
 &lt;div id="replication-slot-spill-testing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#replication-slot-spill-testing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Pub/sub replication link setup
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;logical_decoding_work_mem &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;MB &lt;span style="color:#f92672"&gt;#&lt;/span&gt;pg_ctl reload
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wal_segment_size &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;128&lt;/span&gt; MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--source
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; replication_table (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id BIGSERIAL &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column1 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column2 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column3 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; publication pub_test &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; replication_table ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--dest
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; replication_table (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id BIGSERIAL &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column1 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column2 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column3 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; SUBSCRIPTION sub_test
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CONNECTION&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;host=127.0.0.1 port=8094 dbname=lzl user=lzl password=qwer&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PUBLICATION pub_test;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--source
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_replication_slots;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Large Transaction, No Subtransactions, Replicated Table Spill Test
 &lt;div id="large-transaction-no-subtransactions-replicated-table-spill-test" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#large-transaction-no-subtransactions-replicated-table-spill-test" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Create a large transaction, don&amp;#39;t commit yet
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; replication_table(column1,column2,column3) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;c&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1000000&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Replication slot spill
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;331924&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 184 Dec 9 20:22 state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 88226964 Dec 9 20:22 xid-5074343-lsn-163-38000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 119698488 Dec 9 20:22 xid-5074343-lsn-163-40000000.spill&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After the large transaction commits, wait for consumption until replication lag is 0, and the spill files disappear:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pid,usename,sent_lsn,write_lsn,flush_lsn,replay_lsn,write_lag,flush_lag,replay_lag,reply_time &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_replication;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; sent_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; write_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; flush_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; replay_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; write_lag &lt;span style="color:#f92672"&gt;|&lt;/span&gt; flush_lag &lt;span style="color:#f92672"&gt;|&lt;/span&gt; replay_lag &lt;span style="color:#f92672"&gt;|&lt;/span&gt; reply_time 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+---------+--------------+--------------+--------------+--------------+-----------+-----------+------------+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;163525&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;163&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4996&lt;/span&gt;E1C8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;163&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4996&lt;/span&gt;E1C8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;163&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4996&lt;/span&gt;E1C8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;163&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4996&lt;/span&gt;E1C8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14769&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pid,usename,pg_wal_lsn_diff(pg_current_wal_lsn(),sent_lsn) diff_sent_mb,pg_wal_lsn_diff(pg_current_wal_lsn(),write_lsn) diff_write_mb,pg_wal_lsn_diff(pg_current_wal_lsn(),flush_lsn) diff_flush_mb,pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn) diff_replay_mb,pg_walfile_name_offset(sent_lsn) sentoffset,pg_walfile_name_offset(write_lsn) writeoffset,pg_walfile_name_offset(flush_lsn) flush_lsn &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_replication;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; diff_sent_mb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; diff_write_mb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; diff_flush_mb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; diff_replay_mb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; sentoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; writeoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; flush_lsn 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+---------+--------------+---------------+---------------+----------------+-------------------------------------+-------------------------------------+-------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;163525&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;000000010000016300000009&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;26665416&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;000000010000016300000009&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;26665416&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#f92672"&gt;/&lt;/span&gt;mypg&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg8094&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg_replslot&lt;span style="color:#f92672"&gt;/&lt;/span&gt;sub_test]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;357392&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 184 Dec 9 20:23 state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 88226964 Dec 9 20:22 xid-5074343-lsn-163-38000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 137696328 Dec 9 20:23 xid-5074343-lsn-163-40000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 26076708 Dec 9 20:23 xid-5074343-lsn-163-48000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#f92672"&gt;/&lt;/span&gt;mypg&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg8094&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg_replslot&lt;span style="color:#f92672"&gt;/&lt;/span&gt;sub_test]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 184 Dec 9 20:25 state2666
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Large Transaction, No Subtransactions, Non-Replicated Table Spill Test
 &lt;div id="large-transaction-no-subtransactions-non-replicated-table-spill-test" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#large-transaction-no-subtransactions-non-replicated-table-spill-test" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--source: create an unrelated table for writing data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; no_replication_table (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id BIGSERIAL &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column1 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column2 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column3 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Create a large transaction, don&amp;#39;t commit yet
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; no_replication_table(column1,column2,column3) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;c&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1000000&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Spill
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzldb:MYINST:&lt;span style="color:#ae81ff"&gt;8094&lt;/span&gt; &lt;span style="color:#f92672"&gt;/&lt;/span&gt;mypg&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg8094&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg_replslot&lt;span style="color:#f92672"&gt;/&lt;/span&gt;sub_test]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;357492&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 184 Dec 9 20:09 state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 107511456 Dec 9 20:08 xid-5074106-lsn-163-28000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 137698804 Dec 9 20:09 xid-5074106-lsn-163-30000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 4308444 Dec 9 20:09 xid-5074106-lsn-163-38000000.spill&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Large Transaction, Subtransactions, Non-Replicated Table Spill Test
 &lt;div id="large-transaction-subtransactions-non-replicated-table-spill-test" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#large-transaction-subtransactions-non-replicated-table-spill-test" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## One insert per row, each insert as one subtransaction&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo &lt;span style="color:#e6db74"&gt;&amp;#34;begin;&amp;#34;&lt;/span&gt;&amp;gt;subtx.sql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; i in &lt;span style="color:#f92672"&gt;{&lt;/span&gt;1..1000000&lt;span style="color:#f92672"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;do&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; echo &lt;span style="color:#e6db74"&gt;&amp;#34;savepoint p&lt;/span&gt;$i&lt;span style="color:#e6db74"&gt;;&amp;#34;&lt;/span&gt;&amp;gt;&amp;gt;subtx.sql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; echo &lt;span style="color:#e6db74"&gt;&amp;#34;insert into no_replication_table(column1,column2,column3) select &amp;#39;a&amp;#39;,&amp;#39;b&amp;#39;,&amp;#39;c&amp;#39;;&amp;#34;&lt;/span&gt;&amp;gt;&amp;gt;subtx.sql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nohup psql -d lzl -f subtx.sql &amp;amp;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#During execution, observed 800k+ spill files&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;/myhost/pg8094/data/pg_replslot/sub_test&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll |wc -l
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;823749&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;/myhost/pg8094/data/pg_replslot/sub_test&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll |head -10
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;1099532&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;184&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:10 state
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;1236&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:10 xid-5519686-lsn-163-70000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;252&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 xid-5519687-lsn-163-70000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;252&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 xid-5519688-lsn-163-70000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;252&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 xid-5519689-lsn-163-70000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;252&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 xid-5519690-lsn-163-70000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;252&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 xid-5519691-lsn-163-70000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;252&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 xid-5519692-lsn-163-70000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;252&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 xid-5519693-lsn-163-70000000.spill&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Analysis of Slow Database Startup
 &lt;div id="analysis-of-slow-database-startup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analysis-of-slow-database-startup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Startup Process Startup Flow Analysis
 &lt;div id="startup-process-startup-flow-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#startup-process-startup-flow-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Here we parse the startup flow frame by frame using the call stack:&lt;/p&gt;
&lt;p&gt;11: &lt;code&gt;main&lt;/code&gt;: Nothing to say.&lt;/p&gt;
&lt;p&gt;10: &lt;code&gt;PostmasterMain&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;Before the main loop, it first calls the startup flow &lt;code&gt;StartupPID = StartupDataBase();&lt;/code&gt; which essentially calls &lt;code&gt;StartChildProcess(StartupProcess)&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define StartupDataBase()		StartChildProcess(StartupProcess)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;9: &lt;code&gt;StartChildProcess&lt;/code&gt;: Forks a process. This process is the auxiliary process for starting postmaster; normal child process startup goes through this logic, forking at this step. The input &lt;code&gt;AuxProcType&lt;/code&gt;=StartupProcess.&lt;/p&gt;
&lt;p&gt;8: &lt;code&gt;AuxiliaryProcessMain&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;Since &lt;code&gt;MyAuxProcType&lt;/code&gt;=StartupProcess, it goes through the &lt;code&gt;StartupProcessMain&lt;/code&gt; flow, which is different from child processes like &lt;strong&gt;walsender&lt;/strong&gt;, walwriter, bgwriter. The startup process itself exists for crash recovery WAL reading, but it does many other things:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (MyAuxProcType)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; CheckerProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* don&amp;#39;t set signals, they&amp;#39;re useless here */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;CheckerModeMain&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* should never return */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; BootstrapProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * There was a brief instant during which mode was Normal; this is
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * okay. We need to be in bootstrap mode during BootStrapXLOG for
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * the sake of multixact initialization.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;SetProcessingMode&lt;/span&gt;(BootstrapProcessing);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;bootstrap_signals&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;BootStrapXLOG&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;BootstrapModeMain&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* should never return */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; StartupProcess: &lt;span style="color:#75715e"&gt;//Here here here here
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* don&amp;#39;t set signals, startup process has its own agenda */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;StartupProcessMain&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* should never return */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; BgWriterProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* don&amp;#39;t set signals, bgwriter has its own agenda */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;BackgroundWriterMain&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* should never return */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; CheckpointerProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* don&amp;#39;t set signals, checkpointer has its own agenda */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;CheckpointerMain&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* should never return */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; WalWriterProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* don&amp;#39;t set signals, walwriter has its own agenda */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;InitXLOGAccess&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;WalWriterMain&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* should never return */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; WalReceiverProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* don&amp;#39;t set signals, walreceiver has its own agenda */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;WalReceiverMain&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* should never return */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;default&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;elog&lt;/span&gt;(PANIC, &lt;span style="color:#e6db74"&gt;&amp;#34;unrecognized process type: %d&amp;#34;&lt;/span&gt;, (&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;) MyAuxProcType);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;7: &lt;code&gt;StartupProcessMain&lt;/code&gt;: Mainly to call &lt;code&gt;StartupXLOG()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;6: &lt;code&gt;StartupXLOG&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;Function comment:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;This must be called ONCE during postmaster or standalone&lt;span style="color:#f92672"&gt;-&lt;/span&gt;backend startup&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;StartupXLOG&lt;/code&gt; is always called by postmaster regardless, whether crash shutdown or consistent shutdown:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (ControlFile&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;state)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; DB_IN_PRODUCTION:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;database system was interrupted; last known up at %s&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							&lt;span style="color:#a6e22e"&gt;str_time&lt;/span&gt;(ControlFile&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;time))));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This matches the log output. Here&amp;rsquo;s the shutdown and startup log:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:57.534 CST,,,447560,,65693cde.6d448,1325,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;database system is shut down&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:03:49.536 CST,,,211844,,6752bdf3.33b84,1,,2024-12-06 17:03:47 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;ending log output to stderr&amp;#34;&lt;/span&gt;,,&lt;span style="color:#e6db74"&gt;&amp;#34;Future log output will go to log destination &amp;#34;&amp;#34;csvlog&amp;#34;&amp;#34;.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:03:49.536 CST,,,211844,,6752bdf3.33b84,2,,2024-12-06 17:03:47 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;starting PostgreSQL 13.2 (RaseSQL 1.3) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.1), 64-bit&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:03:49.537 CST,,,211844,,6752bdf3.33b84,3,,2024-12-06 17:03:47 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;listening on IPv4 address &amp;#34;&amp;#34;0.0.0.0&amp;#34;&amp;#34;, port 7284&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:03:49.539 CST,,,211844,,6752bdf3.33b84,4,,2024-12-06 17:03:47 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;listening on Unix socket &amp;#34;&amp;#34;/tmp/.s.PGSQL.7284&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:03:49.557 CST,,,211995,,6752bdf5.33c1b,1,,2024-12-06 17:03:49 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;database system was interrupted; last known up at 2024-12-06 17:00:10 CST&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;startup&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So, after shutdown, the control file recorded the database state as &lt;code&gt;in production&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Database cluster state: in production&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;in production&lt;/code&gt; state means &lt;strong&gt;the database is running&lt;/strong&gt;, not a normal shutdown state — indicating that at the time of shutdown, it was &lt;strong&gt;not a consistent shutdown&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Continuing with the key code about fsync:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If we previously crashed, perform a couple of actions:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * - The pg_wal directory may still include some temporary WAL segments
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * used when creating a new segment, so perform some clean up to not
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * bloat this path. This is done first as there is no point to sync
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * this temporary data.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * - There might be data which we had written, intending to fsync it, but
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * which we had not actually fsync&amp;#39;d yet. Therefore, a power failure in
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * the near future might cause earlier unflushed writes to be lost, even
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * though more recent data written to disk from here on would be
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * persisted. To avoid that, fsync the entire data directory.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (ControlFile&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;state &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; DB_SHUTDOWNED &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		ControlFile&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;state &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; DB_SHUTDOWNED_IN_RECOVERY)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;RemoveTempXlogFiles&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SyncDataDirectory&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Here, because the control file state is not a normal shutdown, it enters the if-block and calls &lt;code&gt;SyncDataDirectory()&lt;/code&gt; for fsync persistence.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;StartupXLOG&lt;/code&gt; does many many things. Among those related to spill, besides &lt;code&gt;SyncDataDirectory()&lt;/code&gt;, there&amp;rsquo;s also &lt;code&gt;StartupReorderBuffer()&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Initialize replication slots, before there&amp;#39;s a chance to remove
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * required resources.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;StartupReplicationSlots&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Startup logical state, needs to be setup now so we have proper data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * during crash recovery.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;StartupReorderBuffer&lt;/span&gt;();&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;StartupReorderBuffer&lt;/code&gt; is also called. It calls &lt;code&gt;ReorderBufferCleanupSerializedTXNs&lt;/code&gt; to clean up spill files in all slot directories (but does not delete directories or state files):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Delete all data spilled to disk after we&amp;#39;ve restarted/crashed. It will be
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * recreated when the respective slots are reused.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;StartupReorderBuffer&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	DIR		 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;logical_dir;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dirent &lt;span style="color:#f92672"&gt;*&lt;/span&gt;logical_de;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	logical_dir &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;AllocateDir&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; ((logical_de &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ReadDir&lt;/span&gt;(logical_dir, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot&amp;#34;&lt;/span&gt;)) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;strcmp&lt;/span&gt;(logical_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name, &lt;span style="color:#e6db74"&gt;&amp;#34;.&amp;#34;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;strcmp&lt;/span&gt;(logical_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name, &lt;span style="color:#e6db74"&gt;&amp;#34;..&amp;#34;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* if it cannot be a slot, skip the directory */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;ReplicationSlotValidateName&lt;/span&gt;(logical_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name, DEBUG2))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * ok, has to be a surviving logical slot, iterate and delete
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * everything starting with xid-*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ReorderBufferCleanupSerializedTXNs&lt;/span&gt;(logical_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;FreeDir&lt;/span&gt;(logical_dir);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;5: &lt;code&gt;SyncDataDirectory&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;The function comment is very important:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Issue fsync recursively on PGDATA and all its contents.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * We fsync regular files and directories wherever they are, but we
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * follow symlinks only for pg_wal and immediately under pg_tblspc.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Other symlinks are presumed to point at files we&amp;#39;re not responsible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * for fsyncing, and might not have privileges to write at all.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Errors are logged but not considered fatal; that&amp;#39;s because this is used
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * only during database startup, to deal with the possibility that there are
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * issued-but-unsynced writes pending against the data directory. We want to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * ensure that such writes reach disk before anything that&amp;#39;s done in the new
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * run. However, aborting on error would result in failure to start for
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * harmless cases such as read-only files in the data directory, and that&amp;#39;s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * not good either.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Note that if we previously crashed due to a PANIC on fsync(), we&amp;#39;ll be
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * rewriting all changes again during recovery.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Note we assume we&amp;#39;re chdir&amp;#39;d into PGDATA to begin with.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;fsync all data directory files to persist them&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;This action only happens during the startup phase&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;This action ensures the data directory is fully persistent before the database starts running&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The body of &lt;code&gt;SyncDataDirectory&lt;/code&gt; recursively walks directories and fsyncs (with some special handling for symlinks):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;walkdir&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;.&amp;#34;&lt;/span&gt;, datadir_fsync_fname, false, LOG);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (xlog_is_symlink)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;walkdir&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_wal&amp;#34;&lt;/span&gt;, datadir_fsync_fname, false, LOG);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;walkdir&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_tblspc&amp;#34;&lt;/span&gt;, datadir_fsync_fname, true, LOG);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;4: &lt;code&gt;walkdir&lt;/code&gt;: Recurse to &lt;code&gt;.&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;3: &lt;code&gt;walkdir&lt;/code&gt;: Recurse to &lt;code&gt;./pg_replslot&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;2: &lt;code&gt;walkdir&lt;/code&gt;: Recurse to &lt;code&gt;./pg_replslot/slotname&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;1: &lt;code&gt;lstat&lt;/code&gt;: C library call. &lt;code&gt;walkdir&lt;/code&gt; not only does fsync (via the callback &lt;code&gt;datadir_fsync_fname&lt;/code&gt;), the &lt;code&gt;walkdir&lt;/code&gt; function body also does &lt;code&gt;lstat&lt;/code&gt; to get file info such as inode, file size, last modification time, etc. — similar to the Linux &lt;code&gt;stat&lt;/code&gt; command.&lt;/p&gt;
&lt;p&gt;0: &lt;code&gt;_lxstat&lt;/code&gt;: C library call.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Startup logic summary&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PG starts an auxiliary process &lt;code&gt;startup&lt;/code&gt; to help with startup. Unlike common child processes (walwriter, bgwriter, checkpointer, etc.), it&amp;rsquo;s always started during the startup process and does many things.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;StartupXLOG&lt;/code&gt; is always called during startup, whether or not the database was consistently shut down.&lt;/li&gt;
&lt;li&gt;Only in a non-normal shutdown state does &lt;code&gt;SyncDataDirectory&lt;/code&gt; get triggered.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;SyncDataDirectory&lt;/code&gt; fsyncs all data files for persistence and checks stat info for all data files.&lt;/li&gt;
&lt;li&gt;fsync ensures data file consistency before startup; stat is probably to verify files are normal and readable (before the startup process starts, only the readability of the datadir directory was verified).&lt;/li&gt;
&lt;li&gt;Regardless of shutdown state, &lt;code&gt;StartupReorderBuffer&lt;/code&gt; is always called and cleans up spill files for all replication slots.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;When Is the Ready State?
 &lt;div id="when-is-the-ready-state" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#when-is-the-ready-state" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;After the startup process finishes its work, the database is not yet in ready state. When the pmState state machine changes state, the &lt;code&gt;reaper&lt;/code&gt; process reaping function is called. The reaper function itself does some recovery or startup work after a child process exits. The pmState state machine records the state as PM_STARTUP, which controls the startup/shutdown state.&lt;/p&gt;
&lt;p&gt;Last steps of &lt;code&gt;PostmasterMain&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	StartupPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;StartupDataBase&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(StartupPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	StartupStatus &lt;span style="color:#f92672"&gt;=&lt;/span&gt; STARTUP_RUNNING;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_STARTUP; &lt;span style="color:#75715e"&gt;//State machine changes state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Some workers may be scheduled to start now */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;maybe_start_bgworkers&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	status &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ServerLoop&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * ServerLoop probably shouldn&amp;#39;t ever return, but if it does, close down.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ExitPostmaster&lt;/span&gt;(status &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; STATUS_OK);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;abort&lt;/span&gt;();					&lt;span style="color:#75715e"&gt;/* not reached */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The core startup flow of &lt;code&gt;PostmasterMain&lt;/code&gt; goes to &lt;code&gt;reaper&lt;/code&gt; to handle the normal exit of the startup process.&lt;/p&gt;
&lt;p&gt;PMState comment:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * We use a simple state machine to control startup, shutdown, and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * crash recovery (which is rather like shutdown followed by startup).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * After doing all the postmaster initialization work, we enter PM_STARTUP
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * state and the startup process is launched. The startup process begins by
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * reading the control file and other preliminary initialization steps.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * In a normal startup, or after crash recovery, the startup process exits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * with exit code 0 and we switch to PM_RUN state. 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;PMState is passed and processed via signals. After the startup process exits, &lt;code&gt;reaper&lt;/code&gt; is activated to reap the process.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;reaper&lt;/code&gt; function handling the startup child process&amp;rsquo;s normal exit:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; StartupPID)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			StartupPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * Startup succeeded, commence normal operations
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			StartupStatus &lt;span style="color:#f92672"&gt;=&lt;/span&gt; STARTUP_NOT_RUNNING; &lt;span style="color:#75715e"&gt;//Transition from STARTUP_RUNNING to STARTUP_NOT_RUNNING
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			FatalError &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false; &lt;span style="color:#75715e"&gt;//After none of the above ifs are hit, it&amp;#39;s not fatal
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			AbortStartTime &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			ReachedNormalRunning &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_RUN; &lt;span style="color:#75715e"&gt;//State machine transitions from PM_STARTUP to PM_RUN
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			connsAllowed &lt;span style="color:#f92672"&gt;=&lt;/span&gt; ALLOW_ALL_CONNS;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * Crank up the background tasks, if we didn&amp;#39;t do that already
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * when we entered consistent recovery state. It doesn&amp;#39;t matter
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * if this fails, we&amp;#39;ll just try again later.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Below: starting core child processes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (CheckpointerPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				CheckpointerPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;StartCheckpointer&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (BgWriterPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				BgWriterPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;StartBackgroundWriter&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (WalWriterPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				WalWriterPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;StartWalWriter&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * Likewise, start other special children as needed. In a restart
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * situation, some of them may be alive already.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Below: starting non-core child processes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;IsBinaryUpgrade &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;AutoVacuumingActive&lt;/span&gt;() &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; AutoVacPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				AutoVacPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;StartAutoVacLauncher&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;PgArchStartupAllowed&lt;/span&gt;() &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; PgArchPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				PgArchPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;pgarch_start&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (PgStatPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				PgStatPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;pgstat_start&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* workers may be scheduled to start now */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;maybe_start_bgworkers&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		 &lt;span style="color:#75715e"&gt;//At this point it&amp;#39;s officially ready to accept connections
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* at this point we are really open for business */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;database system is ready to accept connections&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Report status */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;AddToDataDirLockFile&lt;/span&gt;(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_READY);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#ifdef USE_SYSTEMD
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;sd_notify&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;READY=1&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#endif
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &amp;ldquo;database system is ready to accept connections&amp;rdquo; message is right here.&lt;/p&gt;
&lt;p&gt;Checkpointer, bgwriter, walwriter, autovacuum, arch (if present), stats — all these processes need to be started. At this stage, launching these processes doesn&amp;rsquo;t have to return success; they can be retried later in &lt;code&gt;ServerLoop&lt;/code&gt; or on the next &lt;code&gt;reaper&lt;/code&gt; execution. Only the startup process must start and complete all related tasks in one shot:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pid &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* in parent, fork failed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			save_errno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; errno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		errno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; save_errno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (type)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; StartupProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not fork startup process: %m&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; BgWriterProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not fork background writer process: %m&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; CheckpointerProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not fork checkpointer process: %m&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; WalWriterProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not fork WAL writer process: %m&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; WalReceiverProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not fork WAL receiver process: %m&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;default&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not fork process: %m&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * fork failure is fatal during startup, but there&amp;#39;s no need to choke
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * immediately if starting other child types fails.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (type &lt;span style="color:#f92672"&gt;==&lt;/span&gt; StartupProcess)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ExitPostmaster&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Spill File Generation Logic Across Versions
 &lt;div id="spill-file-generation-logic-across-versions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#spill-file-generation-logic-across-versions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Spill in all versions spills the largest transaction. Here we focus on when spilling happens.&lt;/p&gt;
&lt;p&gt;PG12: pg12 hard-codes 4096 changes:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; Size max_changes_in_memory &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Check whether the transaction tx should spill its data to disk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferCheckSerializeTXN&lt;/span&gt;(ReorderBuffer &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rb, ReorderBufferTXN &lt;span style="color:#f92672"&gt;*&lt;/span&gt;txn)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * TODO: improve accounting so we cheaply can take subtransactions into
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * account here.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (txn&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;nentries_mem &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; max_changes_in_memory)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ReorderBufferSerializeTXN&lt;/span&gt;(rb, txn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(txn&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;nentries_mem &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;PG13: Spills when exceeding &lt;code&gt;logical_decoding_work_mem&lt;/code&gt; memory size:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferCheckMemoryLimit&lt;/span&gt;(ReorderBuffer &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rb)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; (rb&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;size &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; logical_decoding_work_mem &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1024L&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Pick the largest transaction (or subtransaction) and evict it from
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * memory by serializing it to disk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		txn &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ReorderBufferLargestTXN&lt;/span&gt;(rb);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ReorderBufferSerializeTXN&lt;/span&gt;(rb, txn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;PG14: Adds streaming transfer &lt;code&gt;ReorderBufferStreamTXN&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferCheckMemoryLimit&lt;/span&gt;(ReorderBuffer &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rb)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; (rb&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;size &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; logical_decoding_work_mem &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1024L&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Pick the largest transaction (or subtransaction) and evict it from
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * memory by streaming, if possible. Otherwise, spill to disk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;ReorderBufferCanStartStreaming&lt;/span&gt;(rb) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			(txn &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ReorderBufferLargestTopTXN&lt;/span&gt;(rb)) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ReorderBufferStreamTXN&lt;/span&gt;(rb, txn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ReorderBufferSerializeTXN&lt;/span&gt;(rb, txn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Although PG14 has streaming replication, triggering it requires certain conditions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Returns true, if the streaming can be started now, false, otherwise. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;inline&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferCanStartStreaming&lt;/span&gt;(ReorderBuffer &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rb)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	LogicalDecodingContext &lt;span style="color:#f92672"&gt;*&lt;/span&gt;ctx &lt;span style="color:#f92672"&gt;=&lt;/span&gt; rb&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;private_data;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SnapBuild &lt;span style="color:#f92672"&gt;*&lt;/span&gt;builder &lt;span style="color:#f92672"&gt;=&lt;/span&gt; ctx&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;snapshot_builder;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* We can&amp;#39;t start streaming unless a consistent state is reached. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;SnapBuildCurrentState&lt;/span&gt;(builder) &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; SNAPBUILD_CONSISTENT)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * We can&amp;#39;t start streaming immediately even if the streaming is enabled
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * because we previously decoded this transaction and now just are
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * restarting.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;ReorderBufferCanStream&lt;/span&gt;(rb) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;SnapBuildXactNeedsSkip&lt;/span&gt;(builder, ctx&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;reader&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;EndRecPtr))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Found a point after SNAPBUILD_FULL_SNAPSHOT where all transactions that
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * were running at that point finished. Till we reach that we hold off
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * calling any commit callbacks.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SNAPBUILD_CONSISTENT &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Additional streaming trigger conditions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Condition 1: All transactions covered by the snapshot have completed (presumably committed or rolled back)&lt;/li&gt;
&lt;li&gt;Condition 2: The context is private data (does this mean two links to one table won&amp;rsquo;t trigger streaming?)&lt;/li&gt;
&lt;li&gt;Condition 3: Transactions in the snapshot are not skippable (probably some special transactions can be skipped)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PG15: Similar to 14, just cleaner functions with less nesting.&lt;/p&gt;
&lt;p&gt;PG16: About the same.&lt;/p&gt;
&lt;p&gt;PG17: About the same, adds &lt;code&gt;DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE&lt;/code&gt; to force streaming.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key points to remember:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PG12 and earlier: hard-coded 4096 changes&lt;/li&gt;
&lt;li&gt;PG13: adds &lt;code&gt;logical_decoding_work_mem&lt;/code&gt; parameter, allowing memory size adjustment to reduce spill probability&lt;/li&gt;
&lt;li&gt;PG14 and later: supports streaming replication&lt;/li&gt;
&lt;li&gt;Triggering streaming also requires certain conditions, so even with streaming, spills can still happen&lt;/li&gt;
&lt;li&gt;PG17: adds &lt;code&gt;debug_logical_replication_streaming&lt;/code&gt; parameter to force streaming&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Spill File Cleanup Logic
 &lt;div id="spill-file-cleanup-logic" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#spill-file-cleanup-logic" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Startup-time spill cleanup is just one scenario. There&amp;rsquo;s also walsender startup cleanup and drop slot cleanup.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Walsender Startup Cleanup
 &lt;div id="walsender-startup-cleanup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#walsender-startup-cleanup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;ReorderBufferCleanupSerializedTXNs&lt;/code&gt; is called during database startup (before walsender has started) and during walsender startup (while the database is running). Note these are different scenarios, though they call the same function. From the function comment, it&amp;rsquo;s meant to &amp;ldquo;remove leftover serialized reorder buffers&amp;rdquo; — i.e., clean up spill files.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Remove any leftover serialized reorder buffers from a slot directory after a
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * prior crash or decoding session exit.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferCleanupSerializedTXNs&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;slotname)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	DIR		 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;spill_dir;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dirent &lt;span style="color:#f92672"&gt;*&lt;/span&gt;spill_de;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; stat statbuf;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;		path[MAXPGPATH &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;sprintf&lt;/span&gt;(path, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot/%s&amp;#34;&lt;/span&gt;, slotname);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* we&amp;#39;re only handling directories here, skip if it&amp;#39;s not ours */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;lstat&lt;/span&gt;(path, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;statbuf) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;S_ISDIR&lt;/span&gt;(statbuf.st_mode))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	spill_dir &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;AllocateDir&lt;/span&gt;(path);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; ((spill_de &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ReadDirExtended&lt;/span&gt;(spill_dir, path, INFO)) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* only look at names that can be ours */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Only compare first 3 characters
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;strncmp&lt;/span&gt;(spill_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name, &lt;span style="color:#e6db74"&gt;&amp;#34;xid&amp;#34;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;snprintf&lt;/span&gt;(path, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(path),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot/%s/%s&amp;#34;&lt;/span&gt;, slotname,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 spill_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;unlink&lt;/span&gt;(path) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(ERROR,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errcode_for_file_access&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not remove file &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt; during removal of pg_replslot/%s/xid*: %m&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								path, slotname)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;FreeDir&lt;/span&gt;(spill_dir);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Two things to note about the above cleanup logic:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cleans files whose names start with &amp;ldquo;xid&amp;rdquo;. Obviously, the state file is not cleaned.&lt;/li&gt;
&lt;li&gt;Uses unlink to clean, one file at a time. This can help us devise a faster startup scheme.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Database Startup Cleanup
 &lt;div id="database-startup-cleanup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#database-startup-cleanup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;During database startup, a startup process is forked to clean slots. The cleanup function is the same one walsender calls: &lt;code&gt;ReorderBufferCleanupSerializedTXNs&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;One more difference: after walsender restarts, it only cleans spills for the current slot with the same name; whereas during database startup, all slot spills are cleaned sequentially.&lt;/p&gt;
&lt;p&gt;Database startup process, while-loop sequential cleanup logic:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;StartupReorderBuffer&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	DIR		 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;logical_dir;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dirent &lt;span style="color:#f92672"&gt;*&lt;/span&gt;logical_de;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	logical_dir &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;AllocateDir&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; ((logical_de &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ReadDir&lt;/span&gt;(logical_dir, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot&amp;#34;&lt;/span&gt;)) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{	&lt;span style="color:#75715e"&gt;//Exclude . and ..
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;strcmp&lt;/span&gt;(logical_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name, &lt;span style="color:#e6db74"&gt;&amp;#34;.&amp;#34;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;strcmp&lt;/span&gt;(logical_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name, &lt;span style="color:#e6db74"&gt;&amp;#34;..&amp;#34;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;//Validate slot name
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* if it cannot be a slot, skip the directory */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;ReplicationSlotValidateName&lt;/span&gt;(logical_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name, DEBUG2))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * ok, has to be a surviving logical slot, iterate and delete
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * everything starting with xid-*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ReorderBufferCleanupSerializedTXNs&lt;/span&gt;(logical_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;FreeDir&lt;/span&gt;(logical_dir);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The while loop calls &lt;code&gt;ReorderBufferCleanupSerializedTXNs&lt;/code&gt;, and after that, the logic is the same as walsender startup cleanup.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Manual Cleanup via pg_drop_replication_slot
 &lt;div id="manual-cleanup-via-pg_drop_replication_slot" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#manual-cleanup-via-pg_drop_replication_slot" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The drop slot cleanup logic is &lt;strong&gt;different&lt;/strong&gt; from the automatic spill file cleanup — it does not call &lt;code&gt;ReorderBufferCleanupSerializedTXNs&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Drop slot flow:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pg_drop_replication_slot(PG_FUNCTION_ARGS)&lt;/code&gt; -&amp;gt; &lt;code&gt;ReplicationSlotDrop(const char *name, bool nowait)&lt;/code&gt; -&amp;gt; &lt;code&gt;ReplicationSlotDropAcquired(void)&lt;/code&gt; -&amp;gt; &lt;code&gt;ReplicationSlotDropPtr&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ReplicationSlotDropPtr&lt;/code&gt;&amp;rsquo;s slot cleanup logic is also interesting:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Permanently drop the replication slot which will be released by the point
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * this function returns.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReplicationSlotDropPtr&lt;/span&gt;(ReplicationSlot &lt;span style="color:#f92672"&gt;*&lt;/span&gt;slot)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;		path[MAXPGPATH];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;		tmppath[MAXPGPATH];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If some other backend ran this code concurrently with us, we might try
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * to delete a slot with a certain name while someone else was trying to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * create a slot with the same name.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LWLockAcquire&lt;/span&gt;(ReplicationSlotAllocationLock, LW_EXCLUSIVE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Generate pathnames. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;sprintf&lt;/span&gt;(path, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot/%s&amp;#34;&lt;/span&gt;, &lt;span style="color:#a6e22e"&gt;NameStr&lt;/span&gt;(slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.name));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;sprintf&lt;/span&gt;(tmppath, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot/%s.tmp&amp;#34;&lt;/span&gt;, &lt;span style="color:#a6e22e"&gt;NameStr&lt;/span&gt;(slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.name));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Rename the slot directory on disk, so that we&amp;#39;ll no longer recognize
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * this as a valid slot. Note that if this fails, we&amp;#39;ve got to mark the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * slot inactive before bailing out. If we&amp;#39;re dropping an ephemeral or a
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * temporary slot, we better never fail hard as the caller won&amp;#39;t expect
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * the slot to survive and this might get called during error handling.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;rename&lt;/span&gt;(path, tmppath) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#75715e"&gt;//rename file
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * We need to fsync() the directory we just renamed and its parent to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * make sure that our changes are on disk in a crash-safe fashion. If
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * fsync() fails, we can&amp;#39;t be sure whether the changes are on disk or
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * not. For now, we handle that by panicking;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * StartupReplicationSlots() will try to straighten it out after
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * restart.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//fsync persistence
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;START_CRIT_SECTION&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;fsync_fname&lt;/span&gt;(tmppath, true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;fsync_fname&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot&amp;#34;&lt;/span&gt;, true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;END_CRIT_SECTION&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If removing the directory fails, the worst thing that will happen is
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * that the user won&amp;#39;t be able to create a new slot with the same name
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * until the next server restart. We warn about it, but that&amp;#39;s all.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;rmtree&lt;/span&gt;(tmppath, true))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(WARNING,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not remove directory &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;, tmppath)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * We release this at the very end, so that nobody starts trying to create
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * a slot while we&amp;#39;re still cleaning up the detritus of the old one.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(ReplicationSlotAllocationLock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Drop slot doesn&amp;rsquo;t directly unlink files in the slot directory. Instead, it first renames the &lt;code&gt;slotname/&lt;/code&gt; directory to &lt;code&gt;slotname.tmp/&lt;/code&gt;, then unlinks the files inside, and finally removes the &lt;code&gt;slotname.tmp/&lt;/code&gt; directory itself.&lt;/p&gt;
&lt;p&gt;In this, rmtree also loops to unlink files.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Accelerated Startup Scheme After Replication Slot Spill
 &lt;div id="accelerated-startup-scheme-after-replication-slot-spill" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#accelerated-startup-scheme-after-replication-slot-spill" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Deleting 10 million spill files is obviously very slow, but directly moving the directory (&lt;code&gt;mv&lt;/code&gt;) is extremely fast. However, direct &lt;code&gt;mv&lt;/code&gt; requires attention to the name after the move and the state file, as well as knowing which source code step the &lt;code&gt;mv&lt;/code&gt; bypasses.&lt;/p&gt;

&lt;h3 class="relative group"&gt;mv Naming Notes
 &lt;div id="mv-naming-notes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#mv-naming-notes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since it was an abnormal shutdown, the startup process will execute &lt;code&gt;SyncDataDirectory&lt;/code&gt; to fsync and stat all data files — this is hard to bypass. After &lt;code&gt;SyncDataDirectory&lt;/code&gt; completes, it starts handling replication slots. When handling slots, it calls &lt;code&gt;StartupReorderBuffer()&lt;/code&gt; -&amp;gt; &lt;code&gt;ReorderBufferCleanupSerializedTXNs&lt;/code&gt; to fully clean up spill files.&lt;/p&gt;
&lt;p&gt;Before entering cleanup, &lt;code&gt;ReplicationSlotValidateName&lt;/code&gt; validates the slot name. We can exploit &lt;code&gt;ReplicationSlotValidateName&lt;/code&gt; to trick the startup process into skipping the &lt;code&gt;ReorderBufferCleanupSerializedTXNs&lt;/code&gt; process.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ReplicationSlotValidateName&lt;/code&gt; rules:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReplicationSlotValidateName&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;name, &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; elevel)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (cp &lt;span style="color:#f92672"&gt;=&lt;/span&gt; name; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;cp; cp&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{ &lt;span style="color:#75715e"&gt;//Key rule here
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;((&lt;span style="color:#f92672"&gt;*&lt;/span&gt;cp &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;cp &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 &lt;span style="color:#f92672"&gt;||&lt;/span&gt; (&lt;span style="color:#f92672"&gt;*&lt;/span&gt;cp &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;cp &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;9&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 &lt;span style="color:#f92672"&gt;||&lt;/span&gt; (&lt;span style="color:#f92672"&gt;*&lt;/span&gt;cp &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;_&amp;#39;&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(elevel,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					(&lt;span style="color:#a6e22e"&gt;errcode&lt;/span&gt;(ERRCODE_INVALID_NAME),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;replication slot name &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt; contains invalid character&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							name),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Replication slot names may only contain lower case letters, numbers, and the underscore character.&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Valid slot names only contain &lt;code&gt;a-z&lt;/code&gt;, &lt;code&gt;0-9&lt;/code&gt;, &lt;code&gt;_&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;So when renaming, it&amp;rsquo;s recommended to add a dot &lt;code&gt;.&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Recommended&lt;/em&gt;: &lt;code&gt;slotname.bak&lt;/code&gt;, &lt;code&gt;slotname.20241215&lt;/code&gt;, etc.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Not recommended&lt;/em&gt;: &lt;code&gt;slotnamebackup&lt;/code&gt;, &lt;code&gt;slotname20241215&lt;/code&gt;, &lt;code&gt;slotname_bak&lt;/code&gt;, etc.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Not recommended&lt;/em&gt;: &lt;code&gt;.tmp&lt;/code&gt; suffix — slot names with &lt;code&gt;.tmp&lt;/code&gt; have special meaning.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After renaming, you need to create the directory and copy the state file, otherwise the slot will behave strangely on startup (e.g., duplicate slot names, auto-generated slot names, inability to delete slots, downstream unable to start the replication link, etc.).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Recommended mv operations summarized:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cd pg_replslot
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mv slotname slotname.bak 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mkdir slotname
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cp slotname.bak/state slotname/&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Startup Time Comparison
 &lt;div id="startup-time-comparison" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#startup-time-comparison" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Compare startup speed across different source code flows to see if manual mv/rm acceleration is actually meaningful.&lt;/p&gt;
&lt;p&gt;Reference source logic principles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Normal shutdown: goes through fsync and stat&lt;/li&gt;
&lt;li&gt;Abnormal shutdown: goes through fsync and stat&lt;/li&gt;
&lt;li&gt;Valid mv: rename slot directory to &lt;code&gt;.bak&lt;/code&gt;, skip unlink&lt;/li&gt;
&lt;li&gt;Invalid mv: rename slot directory to &lt;code&gt;_bak&lt;/code&gt;, spill files start with xid, goes through unlink&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since actual spill files would be too slow, I manually created fake slot directories and spill files: 50 slots total, 400k spills per slot, 20 million spills total, to test startup time (using &lt;code&gt;cp&lt;/code&gt; directory is much faster than &lt;code&gt;cp&lt;/code&gt; or &lt;code&gt;dd&lt;/code&gt; files).&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;#&lt;/th&gt;
 &lt;th&gt;Test Plan&lt;/th&gt;
 &lt;th&gt;Startup Time&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;1&lt;/td&gt;
 &lt;td&gt;Normal shutdown; no fsync/stat, no unlink&lt;/td&gt;
 &lt;td&gt;0.1 seconds&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;td&gt;Normal shutdown, invalid mv; no fsync/stat, unlink&lt;/td&gt;
 &lt;td&gt;11 min 41 sec&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3&lt;/td&gt;
 &lt;td&gt;Abnormal shutdown, valid mv; fsync/stat, no unlink&lt;/td&gt;
 &lt;td&gt;4 min 35 sec&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;4&lt;/td&gt;
 &lt;td&gt;Abnormal shutdown, invalid mv; fsync/stat, unlink&lt;/td&gt;
 &lt;td&gt;32 min 2 sec&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;5&lt;/td&gt;
 &lt;td&gt;Abnormal shutdown, rm (create slot dir, keep state)&lt;/td&gt;
 &lt;td&gt;13 min 4 sec&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Comparing plans 3 and 5, theoretically in the scenario at hand, a valid mv could achieve startup in about 4 minutes, while rm would take about 13 minutes. (This is a rough comparison; the recovery environment already showed some differences.)&lt;/p&gt;</content:encoded></item><item><title>PostgreSQL Case Study: Analysis of Abnormally Long Planning Time</title><link>https://lastdba.com/en/2024/08/21/postgresql-case-study-analysis-of-abnormally-long-planning-time/</link><pubDate>Wed, 21 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/21/postgresql-case-study-analysis-of-abnormally-long-planning-time/</guid><description>&lt;h2 class="relative group"&gt;Problem Analysis Overview
 &lt;div id="problem-analysis-overview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-analysis-overview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The database kept OOMing. Analysis revealed the issue was in query plan generation: planning time ~1 second, planning shared hits ~1 million. After thorough investigation, the root cause was identified as bloat in the statistics base table &lt;code&gt;pg_statistic&lt;/code&gt;. On the first SQL execution of a session — due to a CatCacheMiss — the backend accessed and cached an excessive amount of dead tuple data from &lt;code&gt;pg_statistic&lt;/code&gt;. Application connections always spawned new sessions, and the combined memory usage across multiple backends was too large, leading to OOM.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Analysis Overview
 &lt;div id="problem-analysis-overview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-analysis-overview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The database kept OOMing. Analysis revealed the issue was in query plan generation: planning time ~1 second, planning shared hits ~1 million. After thorough investigation, the root cause was identified as bloat in the statistics base table &lt;code&gt;pg_statistic&lt;/code&gt;. On the first SQL execution of a session — due to a CatCacheMiss — the backend accessed and cached an excessive amount of dead tuple data from &lt;code&gt;pg_statistic&lt;/code&gt;. Application connections always spawned new sessions, and the combined memory usage across multiple backends was too large, leading to OOM.&lt;/p&gt;
&lt;p&gt;Below is the detailed analysis process.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Problem Symptoms
 &lt;div id="problem-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;A certain database kept OOMing and restarting. After investigation, we found that while the number of concurrent sessions wasn&amp;rsquo;t high, each session&amp;rsquo;s memory footprint was quite large. The total memory exceeded the cgroup memory limit, causing OOM.&lt;/p&gt;
&lt;p&gt;We could preliminarily rule out the following causes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Not caused by excessive metadata. Too many objects (typically too many partitions) would cause sessions to cache excessive metadata. This database didn&amp;rsquo;t have that many objects.&lt;/li&gt;
&lt;li&gt;Not caused by SQL execution plan issues. Sorting/hash operations might use too much memory. This database didn&amp;rsquo;t fit that scenario — the SQL in question was a simple sequential scan.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;During the investigation, we discovered that any simple SQL query in this database took a very long time to execute, and Planning Buffers showed about 1 million hits:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers,timing) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlinfo &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;71&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;011&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;012&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlinfo (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;480&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;73&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;473&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;71&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;010&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;010&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1127312&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- Abnormal planning shared hit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;947&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;038&lt;/span&gt; ms &lt;span style="color:#75715e"&gt;-- Abnormal planning time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;035&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Running the same SQL a second time, the planning time was normal.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Problem Investigation Process
 &lt;div id="problem-investigation-process" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-investigation-process" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Printing Execution Plan Statistics
 &lt;div id="printing-execution-plan-statistics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#printing-execution-plan-statistics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;We enabled logging for each phase of the execution plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; log_parser_stats &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; log_planner_stats &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; log_executor_stats &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Then ran the SQL. The log output was as follows:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-08-13 10:02:33.936 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;postgres&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,85532,&lt;span style="color:#e6db74"&gt;&amp;#34;[local]&amp;#34;&lt;/span&gt;,66babe8c.14e1c,13,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-08-13 10:01:48 CST,4/713,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;PARSER STATISTICS&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;! system usage stats:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0.000046 s user, 0.000046 s system, 0.000091 s elapsed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! [0.001661 s user, 0.001661 s system total]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 4660 kB max resident size
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/0 [0/8] filesystem blocks in/out
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/36 [0/996] page faults/reclaims, 0 [0] swaps
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/0 [5/0] voluntary/involuntary context switches&amp;#34;&lt;/span&gt;,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;explain (analyze,buffers) select *,1 from lzlinfo
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;2024-08-13 10:02:33.938 CST,&amp;#34;&lt;/span&gt;postgres&lt;span style="color:#e6db74"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;lzldb&lt;span style="color:#e6db74"&gt;&amp;#34;,85532,&amp;#34;&lt;/span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;local&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;,66babe8c.14e1c,14,&amp;#34;&lt;/span&gt;EXPLAIN&lt;span style="color:#e6db74"&gt;&amp;#34;,2024-08-13 10:01:48 CST,4/713,0,LOG,00000,&amp;#34;&lt;/span&gt;PARSE ANALYSIS STATISTICS&lt;span style="color:#e6db74"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;! system usage stats:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! 0.001459 s user, 0.000000 s system, 0.001464 s elapsed
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0.003146 s user, 0.001687 s system total&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! &lt;span style="color:#ae81ff"&gt;5972&lt;/span&gt; kB max resident size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! 0/0 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/8&lt;span style="color:#f92672"&gt;]&lt;/span&gt; filesystem blocks in/out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! 0/325 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/1324&lt;span style="color:#f92672"&gt;]&lt;/span&gt; page faults/reclaims, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; swaps
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; signals rcvd, 0/0 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; messages rcvd/sent
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! 0/0 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;5/0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; voluntary/involuntary context switches&lt;span style="color:#e6db74"&gt;&amp;#34;,,,,,&amp;#34;&lt;/span&gt;explain &lt;span style="color:#f92672"&gt;(&lt;/span&gt;analyze,buffers&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; *,1 from lzlinfo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-08-13 10:02:33.938 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;postgres&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,85532,&lt;span style="color:#e6db74"&gt;&amp;#34;[local]&amp;#34;&lt;/span&gt;,66babe8c.14e1c,15,&lt;span style="color:#e6db74"&gt;&amp;#34;EXPLAIN&amp;#34;&lt;/span&gt;,2024-08-13 10:01:48 CST,4/713,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;REWRITER STATISTICS&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;! system usage stats:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0.000001 s user, 0.000000 s system, 0.000001 s elapsed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! [0.003177 s user, 0.001687 s system total]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 5972 kB max resident size
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/0 [0/8] filesystem blocks in/out
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/0 [0/1324] page faults/reclaims, 0 [0] swaps
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/0 [5/0] voluntary/involuntary context switches&amp;#34;&lt;/span&gt;,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;explain (analyze,buffers) select *,1 from lzlinfo
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;2024-08-13 10:02:34.644 CST,&amp;#34;&lt;/span&gt;postgres&lt;span style="color:#e6db74"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;lzldb&lt;span style="color:#e6db74"&gt;&amp;#34;,85532,&amp;#34;&lt;/span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;local&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;,66babe8c.14e1c,16,&amp;#34;&lt;/span&gt;EXPLAIN&lt;span style="color:#e6db74"&gt;&amp;#34;,2024-08-13 10:01:48 CST,4/713,0,LOG,00000,&amp;#34;&lt;/span&gt;PLANNER STATISTICS&lt;span style="color:#e6db74"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;! system usage stats:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! 0.539964 s user, 0.164083 s system, 0.705718 s elapsed
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0.543248 s user, 0.165770 s system total&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! &lt;span style="color:#ae81ff"&gt;745072&lt;/span&gt; kB max resident size -- Abnormal point
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! 0/0 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/8&lt;span style="color:#f92672"&gt;]&lt;/span&gt; filesystem blocks in/out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! 0/184803 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/186157&lt;span style="color:#f92672"&gt;]&lt;/span&gt; page faults/reclaims, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; swaps
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; signals rcvd, 0/0 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; messages rcvd/sent
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! 0/1 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;5/1&lt;span style="color:#f92672"&gt;]&lt;/span&gt; voluntary/involuntary context switches&lt;span style="color:#e6db74"&gt;&amp;#34;,,,,,&amp;#34;&lt;/span&gt;explain &lt;span style="color:#f92672"&gt;(&lt;/span&gt;analyze,buffers&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; *,1 from lzlinfo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-08-13 10:02:34.644 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;postgres&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,85532,&lt;span style="color:#e6db74"&gt;&amp;#34;[local]&amp;#34;&lt;/span&gt;,66babe8c.14e1c,17,&lt;span style="color:#e6db74"&gt;&amp;#34;EXPLAIN&amp;#34;&lt;/span&gt;,2024-08-13 10:01:48 CST,4/713,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;EXECUTOR STATISTICS&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;! system usage stats:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0.540248 s user, 0.164170 s system, 0.706088 s elapsed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! [0.543532 s user, 0.165857 s system total]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 745596 kB max resident size
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/0 [0/8] filesystem blocks in/out
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/184898 [0/186252] page faults/reclaims, 0 [0] swaps
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/1 [5/1] voluntary/involuntary context switches&amp;#34;&lt;/span&gt;,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;explain (analyze,buffers) select *,1 from lzlinfo
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;During the planner phase, memory usage skyrocketed and elapsed time also spiked. This pinpointed the issue to the planner phase within the overall planning stage. There wasn&amp;rsquo;t much else actionable from the stats.&lt;/p&gt;

&lt;h3 class="relative group"&gt;strace Tracing
 &lt;div id="strace-tracing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#strace-tracing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;strace -p &lt;span style="color:#ae81ff"&gt;76419&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;strace: Process &lt;span style="color:#ae81ff"&gt;76419&lt;/span&gt; attached
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;epoll_wait(&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;, [&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;EPOLLIN, &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;u32&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15422552&lt;/span&gt;, u64&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15422552&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}}&lt;/span&gt;], &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recvfrom(&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;Q\0\0\0\262explain (analyze,buffers) s&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;179&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek(&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, SEEK_END) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;xfed000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x100e000) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x100e000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x100e000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x100e000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1007000) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1007000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1007000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mmap(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;270336&lt;/span&gt;, PROT_READ&lt;span style="color:#f92672"&gt;|&lt;/span&gt;PROT_WRITE, MAP_PRIVATE&lt;span style="color:#f92672"&gt;|&lt;/span&gt;MAP_ANONYMOUS, &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2b7806b0c000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;open&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;base/17076/16678&amp;#34;&lt;/span&gt;, O_RDWR) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek(&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, SEEK_END) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;open&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;base/17076/46160&amp;#34;&lt;/span&gt;, O_RDWR) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek(&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, SEEK_END) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7667712&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;open&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;base/17076/46168&amp;#34;&lt;/span&gt;, O_RDWR) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek(&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, SEEK_END) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;188416&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;open&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;base/17076/46170&amp;#34;&lt;/span&gt;, O_RDWR) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek(&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, SEEK_END) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;188416&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mmap(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;528384&lt;/span&gt;, PROT_READ&lt;span style="color:#f92672"&gt;|&lt;/span&gt;PROT_WRITE, MAP_PRIVATE&lt;span style="color:#f92672"&gt;|&lt;/span&gt;MAP_ANONYMOUS, &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2b78c1b36000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1007000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x102c000) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x102c000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x102c000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x102c000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1025000) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1025000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1025000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek(&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, SEEK_END) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7667712&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;open&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_stat_tmp/pgss_query_texts.stat&amp;#34;&lt;/span&gt;, O_RDWR&lt;span style="color:#f92672"&gt;|&lt;/span&gt;O_CREAT, &lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pwrite64(&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;explain (analyze,buffers) select&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;93934&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pwrite64(&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\0&amp;#34;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;94106&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;close&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\2\0\0\0\250\3\0\0\264B\0\0\10\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\2\0\0\0\250\3\0\0\264B\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\2\0\0\0\250\3\0\0\264B\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\2\0\0\0\250\3\0\0\264B\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\2\0\0\0\250\3\0\0\264B\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\2\0\0\0\10\1\0\0\264B\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;264&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;264&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\2\0\0\0\10\1\0\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;264&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;264&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\16\0\0\0H\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;72&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;72&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;T\0\0\0#\0\1QUERY PLAN\0\0\0\0\0\0\0\0\0\0\31\377\377\377\377&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;826&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;826&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recvfrom(&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;xd2b4e0, &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; EAGAIN (Resource temporarily unavailable)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;epoll_wait(&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;, &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Although there were many shared hits, strace didn&amp;rsquo;t reveal much. strace showed the session only opened 4 data files. Using fd and oid2name to look up the data files, they turned out to be: the table, two indexes on the table, and &lt;code&gt;pathman_config&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;From database &lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filenode Table Name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;46170&lt;/span&gt; ix_name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;46168&lt;/span&gt; pk_lzlinfo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;46160&lt;/span&gt; lzlinfo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16678&lt;/span&gt; pathman_config&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;These objects are not large, so it didn&amp;rsquo;t look like oversized tables (or indexes) were the cause.&lt;/p&gt;

&lt;h3 class="relative group"&gt;perf
 &lt;div id="perf" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#perf" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;(No screenshot — use your imagination.)&lt;/p&gt;
&lt;p&gt;The perf flame graph showed ~40% of the time spent on the &lt;code&gt;heap_hot_search_buffer&lt;/code&gt; stack.&lt;/p&gt;

&lt;h3 class="relative group"&gt;gdb
 &lt;div id="gdb" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#gdb" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Using &lt;code&gt;heap_hot_search_buffer&lt;/code&gt; as a clue, after multiple gdb sessions, we set the following breakpoints to investigate:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b relation_open
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b get_relation_info
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b RelationCacheInvalidateEntry 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b get_relname_relid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b AcceptInvalidationMessages
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b RelationClearRelation
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b pg_hint_plan_planner
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b heap_hot_search_buffer&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When breakpoints first hit, there was a lot of noise — they were normal logic. But later, after execution reached a certain point, only &lt;code&gt;heap_hot_search_buffer&lt;/code&gt; kept hitting:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Breakpoint 15, heap_hot_search_buffer &lt;span style="color:#f92672"&gt;(&lt;/span&gt;tid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;tid@entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x2313c60, relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x2b2141663910, buffer&lt;span style="color:#f92672"&gt;=&lt;/span&gt;17045, snapshot&lt;span style="color:#f92672"&gt;=&lt;/span&gt;snapshot@entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x228a058, heapTuple&lt;span style="color:#f92672"&gt;=&lt;/span&gt;heapTuple@entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x23273d0, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; all_dead&lt;span style="color:#f92672"&gt;=&lt;/span&gt;all_dead@entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x7ffce272e28f, first_call&lt;span style="color:#f92672"&gt;=&lt;/span&gt;true&lt;span style="color:#f92672"&gt;)&lt;/span&gt; at heapam.c:1503
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1503&lt;/span&gt; in heapam.c
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;gdb&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Continuing.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Breakpoint 15, heap_hot_search_buffer &lt;span style="color:#f92672"&gt;(&lt;/span&gt;tid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;tid@entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x2313c60, relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x2b2141663910, buffer&lt;span style="color:#f92672"&gt;=&lt;/span&gt;96708, snapshot&lt;span style="color:#f92672"&gt;=&lt;/span&gt;snapshot@entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x228a058, heapTuple&lt;span style="color:#f92672"&gt;=&lt;/span&gt;heapTuple@entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x23273d0, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; all_dead&lt;span style="color:#f92672"&gt;=&lt;/span&gt;all_dead@entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x7ffce272e28f, first_call&lt;span style="color:#f92672"&gt;=&lt;/span&gt;true&lt;span style="color:#f92672"&gt;)&lt;/span&gt; at heapam.c:1503
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1503&lt;/span&gt; in heapam.c&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Most arguments passed to &lt;code&gt;heap_hot_search_buffer&lt;/code&gt; remained unchanged — including the addresses of &lt;code&gt;relation&lt;/code&gt; and &lt;code&gt;heapTuple&lt;/code&gt; — only the &lt;code&gt;buffer&lt;/code&gt; parameter changed, indicating it was scanning the same relation.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;heapTuple&lt;/code&gt; contained table OID information. Let&amp;rsquo;s print it:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;gdb&lt;span style="color:#f92672"&gt;)&lt;/span&gt; p *heapTuple
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$46 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_len &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 968, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_self &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ip_blkid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bi_hi &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bi_lo &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7211&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;}&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ip_posid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;}&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_tableOid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 2619, -- This is useful
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_data &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x2b2155fced00&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;heap_hot_search_buffer&lt;/code&gt; was called with OID=2619. Looking up 2619 in &lt;code&gt;pg_class&lt;/code&gt;, it&amp;rsquo;s &lt;code&gt;pg_statistic&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; oid,relname &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; oid &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;2619&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+----------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2619&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_statistic&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Accessing the statistics base table is expected — PG needs statistics to estimate costs when generating candidate execution plans.&lt;/p&gt;

&lt;h3 class="relative group"&gt;pg_statistic Bloat
 &lt;div id="pg_statistic-bloat" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_statistic-bloat" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Now that we&amp;rsquo;ve pinpointed &lt;code&gt;pg_statistic&lt;/code&gt;, let&amp;rsquo;s check its condition:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;dt&lt;span style="color:#f92672"&gt;+&lt;/span&gt; pg_statistic
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; List &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; relations
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Owner&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Persistence &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Size&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+--------------+-------+----------+-------------+---------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_catalog &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_statistic &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; permanent &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1036&lt;/span&gt; MB &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;pg_statistic&amp;#39;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;-------+------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2619&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_statistic
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relnamespace &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;reltype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12016&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;reloftype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relowner &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relam &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relfilenode &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2619&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;reltablespace &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relpages &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;132481&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;reltuples &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4655&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;pg_statistic&lt;/code&gt; is 1GB — certainly oversized. 132,481 blocks but only 4,655 rows — this is clearly table bloat. But even with bloat, does accessing statistics really require caching the entire &lt;code&gt;pg_statistic&lt;/code&gt; table? Logically, no — you only need the statistics for the specific table. And indeed, PG accesses &lt;code&gt;pg_statistic&lt;/code&gt; through its primary key index &lt;code&gt;pg_statistic_relid_att_inh_index&lt;/code&gt;. From the call stack below, we can see the composite primary key fields being passed:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000086edbc &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; SearchCatCacheMiss (&lt;span style="color:#66d9ef"&gt;cache&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;cache&lt;/span&gt;&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x226ba80, nkeys&lt;span style="color:#f92672"&gt;=&lt;/span&gt;nkeys&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, hashValue&lt;span style="color:#f92672"&gt;=&lt;/span&gt;hashValue&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;853716409&lt;/span&gt;, hashIndex&lt;span style="color:#f92672"&gt;=&lt;/span&gt;hashIndex&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;, v1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;v1&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt;, v2&lt;span style="color:#f92672"&gt;=&lt;/span&gt;v2&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; v3&lt;span style="color:#f92672"&gt;=&lt;/span&gt;v3&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, v4&lt;span style="color:#f92672"&gt;=&lt;/span&gt;v4&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; catcache.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1368&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000086fa82 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; SearchCatCacheInternal (v4&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, v3&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, v2&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, v1&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, nkeys&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;cache&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x226ba80) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; catcache.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1299&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; SearchCatCache3 (&lt;span style="color:#66d9ef"&gt;cache&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x226ba80, v1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;v1&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt;, v2&lt;span style="color:#f92672"&gt;=&lt;/span&gt;v2&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, v3&lt;span style="color:#f92672"&gt;=&lt;/span&gt;v3&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; catcache.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1183&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000000000880d70 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; SearchSysCache3 (cacheId&lt;span style="color:#f92672"&gt;=&lt;/span&gt;cacheId&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;, key1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;key1&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt;, key2&lt;span style="color:#f92672"&gt;=&lt;/span&gt;key2&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, key3&lt;span style="color:#f92672"&gt;=&lt;/span&gt;key3&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; syscache.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1145&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000000000874092 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; get_attavgwidth (relid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;relid&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt;, attnum&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; lsyscache.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2991&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00000000006a2d46 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; set_rel_width (root&lt;span style="color:#f92672"&gt;=&lt;/span&gt;root&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2326600, rel&lt;span style="color:#f92672"&gt;=&lt;/span&gt;rel&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x21e8418) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; costsize.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;5516&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The call passes &lt;code&gt;relid=relid@entry=18767, attnum=1&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; ctid,starelid,staattnum &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_statistic &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; starelid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; starelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; staattnum 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132657&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132657&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132657&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132657&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132658&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132658&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132658&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132658&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132658&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132658&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- lzlinfo has 10 columns total, each with a staattnum entry&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From the ctid, we can see this data actually lives in just 2 blocks.&lt;/p&gt;
&lt;p&gt;Now let&amp;rsquo;s access &lt;code&gt;pg_statistic&lt;/code&gt; via the composite primary key index. Even with data in only 2 blocks, it took 1 second to access with ~1 million (1,141,568) shared hits:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers,timing,&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; ctid,starelid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_statistic &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; starelid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; pg_statistic_relid_att_inh_index &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; pg_catalog.pg_statistic (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;103&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;105&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;416&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1035&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: ctid, starelid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (pg_statistic.starelid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;18767&amp;#39;&lt;/span&gt;::oid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1141568&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- Abnormal
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;102&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;1035&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;802&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Accessing 10 rows in &lt;code&gt;pg_statistic&lt;/code&gt; via the index resulted in ~1M shared hits — roughly matching the ~1M planning shared hits from the original SQL. (Note: Planning Time here is minimal, meaning the issue is not in plan generation per se, but in the data access during planning.)&lt;/p&gt;

&lt;h3 class="relative group"&gt;Index Dead Tuples
 &lt;div id="index-dead-tuples" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#index-dead-tuples" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If vacuum hasn&amp;rsquo;t truly &amp;ldquo;run properly&amp;rdquo;, index dead tuples still point to dead heap tuples.&lt;/p&gt;
&lt;p&gt;Refer to: &lt;a href="https://blog.csdn.net/qq_40687433/article/details/137368881?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522172420012616800225589534%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&amp;amp;request_id=172420012616800225589534&amp;amp;biz_id=0&amp;amp;utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-2-137368881-null-null.nonecase&amp;amp;utm_term=%E8%86%A8%E8%83%80&amp;amp;spm=1018.2226.3001.4450" target="_blank" rel="noreferrer"&gt;From Very Slow Unique Index Scans to Index Bloat&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/16f28ad1a331.png" alt="image.png" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;autovacuum Not Reclaiming Dead Tuples
 &lt;div id="autovacuum-not-reclaiming-dead-tuples" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#autovacuum-not-reclaiming-dead-tuples" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;With such severe table bloat, shouldn&amp;rsquo;t autovacuum have reclaimed it?&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;select * from pg_stat_all_tables where relname=&amp;#39;pg_statistic&amp;#39;\gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-[ RECORD 1 ]-------+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relid | 2619
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;schemaname | pg_catalog
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relname | pg_statistic
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;seq_scan | 1 	 -- Very few sequential scans on pg_statistic
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;seq_tup_read | 4655
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;idx_scan | 28715508 -- Many index scans on pg_statistic
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;idx_tup_fetch | 25150245
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_tup_ins | 46
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_tup_upd | 1292143 -- Lots of updates
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_tup_del | 14
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_tup_hot_upd | 138448
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_live_tup | 4655
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_dead_tup | 1496776
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_mod_since_analyze | 1292203
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_ins_since_vacuum | 0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;last_vacuum | [null]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;last_autovacuum | 2024-08-16 20:34:15.045022+08 -- Note: autovacuum timestamp is recent
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;last_analyze | [null]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;last_autoanalyze | [null]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vacuum_count | 0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;autovacuum_count | 144170
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;analyze_count | 0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;autoanalyze_count | 0&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Actually, autovacuum was constantly running on &lt;code&gt;pg_statistic&lt;/code&gt;, but the worker process may not have been visible because it finished quickly (having nothing to actually reclaim) and went back to naptime:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;show autovacuum_naptime ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; autovacuum_naptime 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 1min&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It naps every 1 minute, and the logs show autovacuum info printed every 1 minute as well:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-08-16 21:05:15.267 CST,,,41080,,66bf4e87.a078,1,,2024-08-16 21:05:11 CST,27/166839,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic vacuum of table &amp;#34;&amp;#34;lzldb.pg_catalog.pg_statistic&amp;#34;&amp;#34;: index scans: 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;pages: 0 removed, 132685 remain, 1 skipped due to pins, 0 skipped frozen
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;tuples: 0 removed, 1501745 remain, 1497090 are dead but not yet removable, oldest xmin: 119329380
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;buffer usage: 265443 hits, 0 misses, 0 dirtied
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;system usage: CPU: user: 0.53 s, system: 0.17 s, elapsed: 3.38 s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;WAL usage: 1 records, 0 full page images, 233 bytes&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;autovacuum worker&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-08-16 21:05:17.474 CST,,,41080,,66bf4e87.a078,2,,2024-08-16 21:05:11 CST,27/166844,136438968,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic analyze of table &amp;#34;&amp;#34;lzldb.public.lzlinfo&amp;#34;&amp;#34; system usage: CPU: user: 2.02 s, system: 0.00 s, elapsed: 2.08 s&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;autovacuum worker&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;1497090 are dead but not yet removable&lt;/code&gt; — although autovacuum was triggered, it didn&amp;rsquo;t reclaim any dead tuples at all. 1,497,090 dead tuples remained uncleaned.&lt;/p&gt;
&lt;p&gt;Investigating who held &lt;code&gt;oldest xmin: 119329380&lt;/code&gt;, we quickly identified a replication slot:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_replication_slots;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; slot_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plugin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; slot_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;temporary&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active_pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; catalog_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; restart_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; confirmed_flush_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wal_status &lt;span style="color:#f92672"&gt;|&lt;/span&gt; safe_wal_size 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------+----------+-----------+--------+----------+-----------+--------+------------+--------+--------------+--------------+---------------------+------------+---------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; slotslotlostname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pgoutput &lt;span style="color:#f92672"&gt;|&lt;/span&gt; logical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;17076&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;119329380&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;F9&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;105&lt;/span&gt;A4970 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;F9&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;105&lt;/span&gt;F8778 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The slot&amp;rsquo;s &lt;code&gt;catalog_xmin=119329380&lt;/code&gt; matched the vacuum&amp;rsquo;s &lt;code&gt;oldest xmin: 119329380&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;active=f&lt;/code&gt; indicated that the replication link was already broken.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Fixing the Problem
 &lt;div id="fixing-the-problem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fixing-the-problem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Drop the replication slot:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_drop_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;slotslotlostname&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_drop_replication_slot 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Then manually vacuum or wait 1 minute for autovacuum.&lt;/p&gt;
&lt;p&gt;Finally, open a brand-new session to verify the fix:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; psql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;psql (&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;help&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; help.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;You &lt;span style="color:#66d9ef"&gt;are&lt;/span&gt; now connected &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;postgres&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers,timing) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlinfo &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;71&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;023&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;025&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlinfo (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3802&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;73&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;473&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;71&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;018&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;018&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2578&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;605&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;098&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Planning time dropped from ~1 second to ~10 ms, and planning shared hits dropped from ~1M to ~2K. The problem was basically resolved.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Case Summary
 &lt;div id="case-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#case-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The replication link broke and the replication slot wasn&amp;rsquo;t cleaned up in time, leading to bloat in the &lt;code&gt;pg_statistic&lt;/code&gt; statistics base table. This caused each backend to be very slow when loading statistics for the first time and to read excessive pages into its local cache. Each backend&amp;rsquo;s cache exceeded normal levels (~2GB), and with multiple backends this led to OOM.&lt;/p&gt;
&lt;p&gt;The problem itself is simple — it was just the investigation that was convoluted. In short: bloat in the base table &lt;code&gt;pg_statistic&lt;/code&gt; caused excessive data access during the plan generation phase. Metadata base table bloat can cause other tricky problems too — until next time.&lt;/p&gt;</content:encoded></item><item><title>A Classic Case of Long Transaction, Table Bloat, and LIMIT Issues</title><link>https://lastdba.com/en/2024/08/12/a-classic-case-of-long-transaction-table-bloat-and-limit-issues/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/a-classic-case-of-long-transaction-table-bloat-and-limit-issues/</guid><description>&lt;h1 class="relative group"&gt;Slow Primary Key Update — Problem Analysis
 &lt;div id="slow-primary-key-update--problem-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#slow-primary-key-update--problem-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h1&gt;
&lt;p&gt;A simple primary key update took over 1 second to execute. Due to high concurrency, the CPU was completely maxed out:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;084&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;lzlopr&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzl&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;158751&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.78.149:51502&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;66055&lt;/span&gt;a6b.&lt;span style="color:#ae81ff"&gt;26&lt;/span&gt;c1f,&lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;UPDATE&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;54&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;528&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;19816630&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;970251337&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;duration: 1218.688 ms plan:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;Query Text: update table_a set (omitted...）=$6 where column_id =$7
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;Update on table_a (cost=0.40..5.49 rows=1 width=2774)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;-&amp;gt; Index Scan using pk_id on table_a (cost=0.40..5.49 rows=1 width=2774)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; Index Cond: ((column_id)::text = $7)&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;PostgreSQL JDBC Driver&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The SQL itself is very simple — an update with a condition on the primary key. Looking at the execution plan, it used the &lt;code&gt;pk_id&lt;/code&gt; primary key index, so there was no problem with the plan itself; the issue wasn&amp;rsquo;t a plan change.&lt;/p&gt;</description><content:encoded>
&lt;h1 class="relative group"&gt;Slow Primary Key Update — Problem Analysis
 &lt;div id="slow-primary-key-update--problem-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#slow-primary-key-update--problem-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h1&gt;
&lt;p&gt;A simple primary key update took over 1 second to execute. Due to high concurrency, the CPU was completely maxed out:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;084&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;lzlopr&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzl&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;158751&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.78.149:51502&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;66055&lt;/span&gt;a6b.&lt;span style="color:#ae81ff"&gt;26&lt;/span&gt;c1f,&lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;UPDATE&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;54&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;528&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;19816630&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;970251337&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;duration: 1218.688 ms plan:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;Query Text: update table_a set (omitted...）=$6 where column_id =$7
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;Update on table_a (cost=0.40..5.49 rows=1 width=2774)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;-&amp;gt; Index Scan using pk_id on table_a (cost=0.40..5.49 rows=1 width=2774)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; Index Cond: ((column_id)::text = $7)&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;PostgreSQL JDBC Driver&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The SQL itself is very simple — an update with a condition on the primary key. Looking at the execution plan, it used the &lt;code&gt;pk_id&lt;/code&gt; primary key index, so there was no problem with the plan itself; the issue wasn&amp;rsquo;t a plan change.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s rewrite the SQL (since it&amp;rsquo;s an UPDATE) and use &lt;code&gt;explain (analyze,buffers)&lt;/code&gt; to compare the execution cost:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; table_a &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; column_id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;d4f713370e584820a9b15e2218ea436a&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;91&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1156&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;052&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;123&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;354&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((column_id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;d4f713370e584820a9b15e2218ea436a&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Heap Blocks: exact&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;13870&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; pk_id (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;91&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;464&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;465&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;13866&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((column_id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;d4f713370e584820a9b15e2218ea436a&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4261&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Planning Time: &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;028&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Time: &lt;span style="color:#ae81ff"&gt;123&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;567&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The actual execution plan is fine, but &lt;code&gt;shared hit=13870&lt;/code&gt; is clearly way too high. Normally, a primary key lookup shouldn&amp;rsquo;t scan that many pages. This strongly suggests table bloat.&lt;/p&gt;
&lt;p&gt;Checking table bloat:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Table size \dt
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Size&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;525&lt;/span&gt; MB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Actual row count
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;827&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Dead tuples from pg_stat_all_tables
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_live_tup &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;786&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_dead_tup &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;657604&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Only ~800 live tuples but 650K dead tuples! This explains why the primary key scan visited so many pages. But why weren&amp;rsquo;t the dead tuples reclaimed?&lt;/p&gt;
&lt;p&gt;When a table exceeds the default 20% modification threshold, autovacuum triggers vacuum to reclaim space. We can see in the logs that autovacuum was indeed being triggered:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-01 14:13:46.649 CST,,,14081,,660a5099.3701,1,,2024-04-01 14:13:45 CST,259/17828993,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic vacuum of table &amp;#34;&amp;#34;lzl.public.table_a&amp;#34;&amp;#34;: index scans: 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;2024-04-01 14:13:47.801 CST,,,14081,,660a5099.3701,2,,2024-04-01 14:13:45 CST,259/17828994,971045014,LOG,00000,&amp;#34;&lt;/span&gt;automatic analyze of table &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;lzl.public.table_a&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt; system usage: CPU: user: 0.08 s, system: 0.01 s, elapsed: 1.15 s&lt;span style="color:#e6db74"&gt;&amp;#34;,,,,,,,,,&amp;#34;&amp;#34;,&amp;#34;&lt;/span&gt;autovacuum worker&lt;span style="color:#e6db74"&gt;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;2024-04-01 14:14:46.673 CST,,,26136,,660a50d5.6618,1,,2024-04-01 14:14:45 CST,259/17829090,0,LOG,00000,&amp;#34;&lt;/span&gt;automatic vacuum of table &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;lzl.public.table_a&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;: index scans: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-01 14:14:47.833 CST,,,26136,,660a50d5.6618,2,,2024-04-01 14:14:45 CST,259/17829091,971049759,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic analyze of table &amp;#34;&amp;#34;lzl.public.table_a&amp;#34;&amp;#34; system usage: CPU: user: 0.08 s, system: 0.03 s, elapsed: 1.15 s&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;autovacuum worker&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-01 14:15:46.680 CST,,,40743,,660a5111.9f27,1,,2024-04-01 14:15:45 CST,259/17829164,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic vacuum of table &amp;#34;&amp;#34;lzl.public.table_a&amp;#34;&amp;#34;: index scans: 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;2024-04-01 14:15:47.849 CST,,,40743,,660a5111.9f27,2,,2024-04-01 14:15:45 CST,259/17829165,971055464,LOG,00000,&amp;#34;&lt;/span&gt;automatic analyze of table &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;lzl.public.table_a&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt; system usage: CPU: user: 0.08 s, system: 0.03 s, elapsed: 1.16 s&lt;span style="color:#e6db74"&gt;&amp;#34;,,,,,,,,,&amp;#34;&amp;#34;,&amp;#34;&lt;/span&gt;autovacuum worker&lt;span style="color:#e6db74"&gt;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;2024-04-01 14:16:46.677 CST,,,52599,,660a514d.cd77,1,,2024-04-01 14:16:45 CST,259/17829263,0,LOG,00000,&amp;#34;&lt;/span&gt;automatic vacuum of table &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;lzl.public.table_a&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;: index scans: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-01 14:16:47.844 CST,,,52599,,660a514d.cd77,2,,2024-04-01 14:16:45 CST,259/17829264,971061382,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic analyze of table &amp;#34;&amp;#34;lzl.public.table_a&amp;#34;&amp;#34; system usage: CPU: user: 0.08 s, system: 0.03 s, elapsed: 1.16 s&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;autovacuum worker&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-01 14:17:46.699 CST,,,64858,,660a5189.fd5a,1,,2024-04-01 14:17:45 CST,234/16589539,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic vacuum of table &amp;#34;&amp;#34;lzl.public.table_a&amp;#34;&amp;#34;: index scans: 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;2024-04-01 14:17:47.851 CST,,,64858,,660a5189.fd5a,2,,2024-04-01 14:17:45 CST,234/16589540,971066091,LOG,00000,&amp;#34;&lt;/span&gt;automatic analyze of table &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;lzl.public.table_a&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt; system usage: CPU: user: 0.09 s, system: 0.02 s, elapsed: 1.15 s&lt;span style="color:#e6db74"&gt;&amp;#34;,,,,,,,,,&amp;#34;&amp;#34;,&amp;#34;&lt;/span&gt;autovacuum worker&lt;span style="color:#e6db74"&gt;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;2024-04-01 14:18:46.703 CST,,,78112,,660a51c5.13120,1,,2024-04-01 14:18:45 CST,259/17829409,0,LOG,00000,&amp;#34;&lt;/span&gt;automatic vacuum of table &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;lzl.public.table_a&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;: index scans: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-01 14:18:47.854 CST,,,78112,,660a51c5.13120,2,,2024-04-01 14:18:45 CST,259/17829410,971070390,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic analyze of table &amp;#34;&amp;#34;lzl.public.table_a&amp;#34;&amp;#34; system usage: CPU: user: 0.09 s, system: 0.02 s, elapsed: 1.15 s&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;autovacuum worker&amp;#34;&lt;/span&gt;		&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Not only was it triggered, but the interval was exactly 1 minute. The default &lt;code&gt;autovacuum_naptime&lt;/code&gt; is 1 minute:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; autovacuum_naptime ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;autovacuum_naptime 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;min&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;We can conclude:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;autovacuum was successfully triggered&lt;/li&gt;
&lt;li&gt;Dead tuples either couldn&amp;rsquo;t be reclaimed fast enough — the dead tuples generated within 1 minute exceeded 20% (maybe 1 minute is too long); or they weren&amp;rsquo;t being reclaimed at all, guaranteeing the next autovacuum trigger&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&amp;rsquo;s look at the detailed autovacuum output:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-01 10:22:44.648 CST,,,16827,,660a1a73.41bb,1,,2024-04-01 10:22:43 CST,170/16910186,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic vacuum of table &amp;#34;&amp;#34;lzl.public.table_a&amp;#34;&amp;#34;: index scans: 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;pages: 0 removed, 48745 remain, 6 skipped due to pins, 0 skipped frozen
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;tuples: 0 removed, 744488 remain, 743666 are dead but not yet removable, oldest xmin: 969118077
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;buffer usage: 97603 hits, 0 misses, 5 dirtied
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;avg read rate: 0.000 MB/s, avg write rate: 0.028 MB/s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;system usage: CPU: user: 0.21 s, system: 0.22 s, elapsed: 1.41 s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;WAL usage: 4 records, 3 full page images, 5129 bytes&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;autovacuum worker&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;autovacuum triggered but reclaimed nothing: &lt;code&gt;tuples: 0 removed, 744488 remain, 743666 are dead but not yet removable, oldest xmin: 969118077&lt;/code&gt;. &lt;code&gt;oldest xmin&lt;/code&gt; represents the oldest transaction in the database — meaning there&amp;rsquo;s a long-running transaction. This is easy to find:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pid,usename,xact_start,state_change,wait_event,&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt;,query &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;idle&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; xact_start ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+------------+-------------------------------+-------------------------------+---------------------+---------------------+------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;164658&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; phbdspsqp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;275408&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;299609&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; DataFileRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;minval&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;maxval&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;min&lt;/span&gt;(ID) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; minval,&lt;span style="color:#66d9ef"&gt;max&lt;/span&gt;(TRACK&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The long transaction was a SQL that had been running since around 8 AM that morning, for several hours. Even though it wasn&amp;rsquo;t on the same table, being the &lt;code&gt;oldest xmin&lt;/code&gt; it still had an impact.&lt;/p&gt;
&lt;p&gt;At this point the root cause is identified:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Table A had frequent updates, high bloat risk&lt;/li&gt;
&lt;li&gt;A long transaction on table B prevented dead tuple reclamation on table A&lt;/li&gt;
&lt;li&gt;Table A&amp;rsquo;s update statements scanned excessive pages&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Solution:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Kill the long transaction: &lt;code&gt;select pg_terminate_backend(164658)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Manually vacuum or wait 1 minute (or less) for automatic vacuum: &lt;code&gt;vacuum table_a&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After both steps were completed, checking dead tuples:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_live_tup &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;707&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_dead_tup &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;298&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;650K dead tuples have been cleaned up.&lt;/p&gt;
&lt;p&gt;Checking the execution plan again:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; table_a &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; column_id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;d4f713370e584820a9b15e2218ea436a&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; pk_id &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;621&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;026&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;029&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((column_id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;d4f713370e584820a9b15e2218ea436a&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;057&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Shared hits down to just 6 — issue resolved.&lt;/p&gt;
&lt;p&gt;Additionally, vacuum only reclaims dead tuples but does not shrink the table — the table remains the same size. Space can only be returned to the OS when new data reuses those pages, or through a repack/table rebuild:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Size&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;525&lt;/span&gt; MB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 class="relative group"&gt;Bonus SQL Optimization — ORDER BY LIMIT
 &lt;div id="bonus-sql-optimization--order-by-limit" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#bonus-sql-optimization--order-by-limit" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h1&gt;
&lt;p&gt;That long-running transaction SQL also had its own problems&amp;hellip;
The business reported it ran fast a few days ago but took several hours today:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;min&lt;/span&gt;(ID) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; minval,&lt;span style="color:#66d9ef"&gt;max&lt;/span&gt;(ID) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; maxval &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; table_b &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; time_at &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; to_timestamp(&lt;span style="color:#e6db74"&gt;&amp;#39;2024-03-30 00:00:00&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;yyyy-MM-dd HH24:mi:ss&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Result&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4298&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;54&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;4298&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;70&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2149&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; pk_b &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_b (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;70&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1181490202&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;549896&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((ID)::text &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (time_at &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; to_timestamp(&lt;span style="color:#e6db74"&gt;&amp;#39;2024-03-30 00:00:00&amp;#39;&lt;/span&gt;::text, &lt;span style="color:#e6db74"&gt;&amp;#39;yyyy-MM-dd HH24:mi:ss&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;70&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2149&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;Backward&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; pk_b &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_b table_b_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;70&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1181490202&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;549896&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((ID)::text &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (time_at &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; to_timestamp(&lt;span style="color:#e6db74"&gt;&amp;#39;2024-03-30 00:00:00&amp;#39;&lt;/span&gt;::text, &lt;span style="color:#e6db74"&gt;&amp;#39;yyyy-MM-dd HH24:mi:ss&amp;#39;&lt;/span&gt;::text))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The SQL is also simple — only one condition on a time column, with decent selectivity.
However, this SQL did not use the &lt;code&gt;time_at&lt;/code&gt; index but instead used the &lt;code&gt;ID&lt;/code&gt; primary key index. This is the same &lt;a href="https://blog.csdn.net/qq_40687433/article/details/134387782?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;LIMIT problem&lt;/a&gt;. Running ANALYZE is useless here — it&amp;rsquo;s better to rewrite the SQL.&lt;/p&gt;
&lt;p&gt;After rewriting, the result came back instantly:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;min&lt;/span&gt;(ID&lt;span style="color:#f92672"&gt;||&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; minval,&lt;span style="color:#66d9ef"&gt;max&lt;/span&gt;(ID&lt;span style="color:#f92672"&gt;||&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; maxval &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; table_b &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; time_at &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; to_timestamp(&lt;span style="color:#e6db74"&gt;&amp;#39;2024-03-30 00:00:00&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;yyyy-MM-dd HH24:mi:ss&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1201418&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;86&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1201418&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;87&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_time_at &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_b (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1195919&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;90&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;549896&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (time_at &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; to_timestamp(&lt;span style="color:#e6db74"&gt;&amp;#39;2024-03-30 00:00:00&amp;#39;&lt;/span&gt;::text, &lt;span style="color:#e6db74"&gt;&amp;#39;yyyy-MM-dd HH24:mi:ss&amp;#39;&lt;/span&gt;::text))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This isn&amp;rsquo;t really an execution plan regression, because the plan didn&amp;rsquo;t change. A few days ago it had the same plan but ran fast — the reason is tied to data distribution and the LIMIT mechanism: when data is quickly found, it returns immediately (which is why the optimizer chose the primary key index); when it&amp;rsquo;s &amp;ldquo;unlucky&amp;rdquo; and the matching data is far away, it takes a very long time.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;A classic case:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A small table with frequent updates&lt;/li&gt;
&lt;li&gt;A long transaction preventing dead tuple reclamation&lt;/li&gt;
&lt;li&gt;The long transaction itself was caused by an index selection problem due to sorting and LIMIT operations (ORDER BY, MAX/MIN, GROUP can all trigger this)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One incident, three classic PostgreSQL knowledge points — quite representative.&lt;/p&gt;</content:encoded></item><item><title>Analyzing a 5MB SQL That Consumed 70GB of Memory</title><link>https://lastdba.com/en/2024/08/12/analyzing-a-5mb-sql-that-consumed-70gb-of-memory/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/analyzing-a-5mb-sql-that-consumed-70gb-of-memory/</guid><description>&lt;h3 class="relative group"&gt;Process Memory Analysis
 &lt;div id="process-memory-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#process-memory-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;WAL writer process (PID 66902) was terminated by signal 6: Aborted&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The log shows postmaster process 66902 was killed.&lt;/p&gt;
&lt;p&gt;Checking OS-level process memory: since &lt;code&gt;top&lt;/code&gt; doesn&amp;rsquo;t show PPID and &lt;code&gt;ps&lt;/code&gt; doesn&amp;rsquo;t show USS, we need both:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66478&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 8.7 10.6 &lt;span style="color:#ae81ff"&gt;57488380&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;56389972&lt;/span&gt; - R 17:13:03 00:02:47 postgres: BIND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;211277&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66478&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 7.8 9.6 &lt;span style="color:#ae81ff"&gt;52294700&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;51127480&lt;/span&gt; - R 17:13:03 00:02:31 postgres: BIND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;222749&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66478&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 22.7 9.3 &lt;span style="color:#ae81ff"&gt;51320000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;49073368&lt;/span&gt; - R 17:35:33 00:02:09 postgres: BIND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;39513&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66478&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 2.9 6.8 &lt;span style="color:#ae81ff"&gt;38651084&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;36354736&lt;/span&gt; ep_poll S 16:13:03 00:02:43 postgres: idle&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Using PPID to identify high-memory backend processes. Let&amp;rsquo;s examine process 211276:&lt;/p&gt;</description><content:encoded>
&lt;h3 class="relative group"&gt;Process Memory Analysis
 &lt;div id="process-memory-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#process-memory-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;WAL writer process (PID 66902) was terminated by signal 6: Aborted&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The log shows postmaster process 66902 was killed.&lt;/p&gt;
&lt;p&gt;Checking OS-level process memory: since &lt;code&gt;top&lt;/code&gt; doesn&amp;rsquo;t show PPID and &lt;code&gt;ps&lt;/code&gt; doesn&amp;rsquo;t show USS, we need both:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66478&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 8.7 10.6 &lt;span style="color:#ae81ff"&gt;57488380&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;56389972&lt;/span&gt; - R 17:13:03 00:02:47 postgres: BIND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;211277&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66478&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 7.8 9.6 &lt;span style="color:#ae81ff"&gt;52294700&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;51127480&lt;/span&gt; - R 17:13:03 00:02:31 postgres: BIND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;222749&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66478&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 22.7 9.3 &lt;span style="color:#ae81ff"&gt;51320000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;49073368&lt;/span&gt; - R 17:35:33 00:02:09 postgres: BIND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;39513&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66478&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 2.9 6.8 &lt;span style="color:#ae81ff"&gt;38651084&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;36354736&lt;/span&gt; ep_poll S 16:13:03 00:02:43 postgres: idle&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Using PPID to identify high-memory backend processes. Let&amp;rsquo;s examine process 211276:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ zcat /osw/oswtop/toposw.dat.gz |grep &lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3271756&lt;/span&gt; 1.1g 1.1g S 7.3 0.2 0:03.93 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3291784&lt;/span&gt; 1.3g 1.2g R 96.4 0.2 0:11.87 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7369628&lt;/span&gt; 6.0g 2.1g R 100.0 1.2 0:46.58 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 17.0g 15.9g 2.1g R 100.0 3.2 1:16.70 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 28.8g 27.7g 2.1g R 100.0 5.5 1:46.82 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 41.4g 40.4g 2.1g R 100.0 8.0 2:16.99 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 54.7g 53.7g 2.1g R 88.8 10.7 2:47.60 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 66.5g 64.9g 2.1g R 34.7 12.9 3:22.76 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 71.0g 68.2g 2.1g R 99.1 13.6 3:52.94 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 74.9g 71.2g 2.1g R 100.0 14.2 4:23.05 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; R 100.0 0.0 4:45.65 postgres&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;We can estimate private memory via &lt;code&gt;RES - SHR = USS&lt;/code&gt;. Process 211276&amp;rsquo;s memory ballooned from ~1GB to ~70GB within minutes, then crashed. All memory growth was private process memory.&lt;/p&gt;

&lt;h3 class="relative group"&gt;SQL Analysis
 &lt;div id="sql-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sql-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The PostgreSQL log shows a &lt;strong&gt;5MB SQL&lt;/strong&gt; containing &lt;strong&gt;5,000+ UNION ALLs&lt;/strong&gt; and &lt;strong&gt;30,000+ bind variables&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The execution plan is over 70,000 lines long:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;218196&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;51&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;218216&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1318&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1628&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; table1 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table1nfo (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((col1)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;xxx&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((colcolcol)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;xxx&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; table1 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table1nfo table1nfo_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((col1)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;xxx&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((colcolcol)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;xxx&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;10544&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10543&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; table2 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table2col t_1317 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((ididid)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;xxx&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((idididid)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;::text)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The plan structure is simple: ~10,000 sub-plans fetching data, then an Append to combine results.&lt;/p&gt;
&lt;p&gt;This SQL monstrosity pushed a single backend process to 70GB. The root cause is clear: reduce the UNION ALLs and the problem goes away (which is indeed what happened). But if we dig deeper, many interesting questions arise:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Why did a 5MB SQL consume 70GB of memory?&lt;/li&gt;
&lt;li&gt;Is the data itself related to memory usage? Was it caused by returning too many rows?&lt;/li&gt;
&lt;li&gt;Is the memory from parsing cache or plan cache?&lt;/li&gt;
&lt;li&gt;Why didn&amp;rsquo;t &lt;code&gt;work_mem&lt;/code&gt; limit the operation memory, even though it&amp;rsquo;s set to a reasonable value?&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;Initial Analysis
 &lt;div id="initial-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#initial-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A 5MB SQL cached in a backend would at minimum contain: metadata, parsed SQL, and plan cache information.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ve seen cases before where metadata cache (relcache) for hundreds of thousands of tables/partitions caused huge backend memory. But this database has few tables, so relcache can be preliminarily ruled out (later confirmed by memory dump).&lt;/p&gt;
&lt;p&gt;Parsed SQL data shouldn&amp;rsquo;t be that large — a 5MB SQL parsed shouldn&amp;rsquo;t produce 70GB.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;work_mem limitations and more:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;work_mem&lt;/code&gt; only limits per-operation memory for sort and hash operations. This creates the &amp;ldquo;multiple sort/hash&amp;rdquo; problem: a single SQL with many sorts can use &lt;code&gt;work_mem&lt;/code&gt; × N. PG 13 introduced &lt;code&gt;hash_mem_multiplier&lt;/code&gt; to cap hash usage within one statement. But what about sorts? Currently no multiplier for sorts, though in practice it&amp;rsquo;s less of a problem — statements with dozens of sort nodes are rare, as they carry high cost, and the optimizer tends to place sorts late in the plan.&lt;/p&gt;
&lt;p&gt;Here, &lt;code&gt;work_mem&lt;/code&gt; is 128MB and the instance is PG 13+ with &lt;code&gt;hash_mem_multiplier=1&lt;/code&gt;, so mass hash memory consumption can be ruled out. Furthermore, the execution plan above has &lt;strong&gt;zero sort or hash operations&lt;/strong&gt;, confirming this is not a sort/hash issue.&lt;/p&gt;
&lt;p&gt;So the earlier question: &lt;em&gt;&amp;ldquo;Why didn&amp;rsquo;t work_mem limit operation memory?&amp;rdquo;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Because the SQL only has UNION ALL — no sort or hash operations at all. &lt;code&gt;work_mem&lt;/code&gt; does not constrain memory here.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Other plan nodes:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;No matter what, &lt;code&gt;work_mem&lt;/code&gt; only (!) limits sort/hash. There are dozens of plan node types — are the rest all unconstrained?&lt;/p&gt;

&lt;h3 class="relative group"&gt;Reproduction and Deep Analysis
 &lt;div id="reproduction-and-deep-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reproduction-and-deep-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;Empty Table Reproduction
 &lt;div id="empty-table-reproduction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#empty-table-reproduction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Create empty table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl1(col1 varchar(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Query with many UNION ALLs
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;union&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;all&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;union&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;all&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...(&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;UNION&lt;/span&gt; ALLs, &lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;size&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;KB)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;(Too many UNION ALLs may exceed &lt;code&gt;max_stack_depth&lt;/code&gt;)&lt;/p&gt;
&lt;p&gt;An empty table + many UNION ALLs immediately reproduces the memory spike. Moreover, after the SQL completes, the backend memory is reclaimed.&lt;/p&gt;
&lt;p&gt;Since this is an empty table (0KB data file), we can rule out data as the cause. So: &lt;em&gt;&amp;ldquo;Is the data itself related to memory? Was it caused by returning too many rows?&amp;rdquo;&lt;/em&gt; — &lt;strong&gt;No, data is not the main factor.&lt;/strong&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;Strace System Call Analysis
 &lt;div id="strace-system-call-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#strace-system-call-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;While executing the SQL, capture system calls with &lt;code&gt;strace -p&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; strace -p &lt;span style="color:#ae81ff"&gt;198337&lt;/span&gt; &amp;gt; strace.198337 2&amp;gt;&amp;amp;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Quick primer on relevant Linux syscalls:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://man7.org/linux/man-pages/man2/epoll_wait.2.html" target="_blank" rel="noreferrer"&gt;epoll_wait&lt;/a&gt;: Wait for an event. Idle processes sit in this state.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://man7.org/linux/man-pages/man3/recvfrom.3p.html" target="_blank" rel="noreferrer"&gt;recvfrom&lt;/a&gt;: Receive a message from a socket.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://man7.org/linux/man-pages/man3/recvfrom.3p.html" target="_blank" rel="noreferrer"&gt;getrusage&lt;/a&gt;: Get resource usage.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://man7.org/linux/man-pages/man2/brk.2.html" target="_blank" rel="noreferrer"&gt;brk&lt;/a&gt;: Program break. Increasing it allocates memory to the process; decreasing it deallocates. &lt;code&gt;malloc&lt;/code&gt; ultimately calls &lt;code&gt;brk&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://man7.org/linux/man-pages/man2/lseek.2.html" target="_blank" rel="noreferrer"&gt;lseek&lt;/a&gt;: Reposition file offset.&lt;/li&gt;
&lt;li&gt;&lt;a href="" &gt;write&lt;/a&gt;: Write to a file descriptor. Does not guarantee disk write.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://man7.org/linux/man-pages/man3/sendto.3p.html" target="_blank" rel="noreferrer"&gt;sendto&lt;/a&gt;: Send a message on a socket.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Syscalls like &lt;code&gt;lseek&lt;/code&gt;, &lt;code&gt;write&lt;/code&gt;, &lt;code&gt;sendto&lt;/code&gt; include fd (file descriptor) information:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek&lt;span style="color:#f92672"&gt;(&lt;/span&gt;37, 0, SEEK_END&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;/proc/[pid]/fd&lt;/code&gt; caches the process&amp;rsquo;s file descriptors. We can map an fd back to a relation — fd 37 is table &lt;code&gt;lzl1&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ cd /proc/198337/fd
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll &lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lrwx------ &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;64&lt;/span&gt; Jan &lt;span style="color:#ae81ff"&gt;26&lt;/span&gt; 22:59 &lt;span style="color:#ae81ff"&gt;37&lt;/span&gt; -&amp;gt; /pgdata/lzl/data13/base/16385/16386
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ oid2name -d lzldb -f &lt;span style="color:#ae81ff"&gt;16386&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;From database &lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filenode Table Name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16386&lt;/span&gt; lzl1&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The strace output is dense but structurally simple:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;strace: Process &lt;span style="color:#ae81ff"&gt;198337&lt;/span&gt; attached
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;epoll_wait&lt;span style="color:#f92672"&gt;(&lt;/span&gt;4, &lt;span style="color:#f92672"&gt;[{&lt;/span&gt;EPOLLIN, &lt;span style="color:#f92672"&gt;{&lt;/span&gt;u32&lt;span style="color:#f92672"&gt;=&lt;/span&gt;44314568, u64&lt;span style="color:#f92672"&gt;=&lt;/span&gt;44314568&lt;span style="color:#f92672"&gt;}}]&lt;/span&gt;, 1, -1&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## step1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recvfrom&lt;span style="color:#f92672"&gt;(&lt;/span&gt;9, &lt;span style="color:#e6db74"&gt;&amp;#34;Q\0\2p\372select col1 from lzl1 union&amp;#34;&lt;/span&gt;..., 8192, 0, NULL, NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recvfrom&lt;span style="color:#f92672"&gt;(&lt;/span&gt;9, &lt;span style="color:#e6db74"&gt;&amp;#34; all\nselect col1 from lzl1 union&amp;#34;&lt;/span&gt;..., 8192, 0, NULL, NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recvfrom&lt;span style="color:#f92672"&gt;(&lt;/span&gt;9, &lt;span style="color:#e6db74"&gt;&amp;#34; all\nselect col1 from lzl1 union&amp;#34;&lt;/span&gt;..., 8192, 0, NULL, NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recvfrom&lt;span style="color:#f92672"&gt;(&lt;/span&gt;9, &lt;span style="color:#e6db74"&gt;&amp;#34; all\nselect col1 from lzl1 union&amp;#34;&lt;/span&gt;..., 8192, 0, NULL, NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recvfrom&lt;span style="color:#f92672"&gt;(&lt;/span&gt;9, &lt;span style="color:#e6db74"&gt;&amp;#34; all\nselect col1 from lzl1 union&amp;#34;&lt;/span&gt;..., 8192, 0, NULL, NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4347&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## step2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x34d5000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x3cd5000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x3cd5000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x3cd5000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x88cd6000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x894d6000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x894d6000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## step3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek&lt;span style="color:#f92672"&gt;(&lt;/span&gt;37, 0, SEEK_END&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek&lt;span style="color:#f92672"&gt;(&lt;/span&gt;37, 0, SEEK_END&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek&lt;span style="color:#f92672"&gt;(&lt;/span&gt;37, 0, SEEK_END&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## step4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x89cd6000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x8a4d6000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a4d6000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a4d6000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a516000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x8a556000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a556000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## step5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;write&lt;span style="color:#f92672"&gt;(&lt;/span&gt;2, &lt;span style="color:#e6db74"&gt;&amp;#34;2024-01-26 23:08:01.800 CST [198&amp;#34;&lt;/span&gt;..., 165521&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;165521&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a556000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x8a57d000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a57d000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a57d000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x8a59f000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a59f000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8d449000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x8d46b000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8d46b000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8d46b000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x8d48d000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8d48d000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#step6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek&lt;span style="color:#f92672"&gt;(&lt;/span&gt;37, 0, SEEK_END&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek&lt;span style="color:#f92672"&gt;(&lt;/span&gt;37, 0, SEEK_END&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek&lt;span style="color:#f92672"&gt;(&lt;/span&gt;37, 0, SEEK_END&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#step7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8dcb1000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8dcb1000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x8c179000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8c179000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8c179000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8c179000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8c179000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x8a526000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a526000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x34d5000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x34d5000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x34d5000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#step8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto&lt;span style="color:#f92672"&gt;(&lt;/span&gt;8, &lt;span style="color:#e6db74"&gt;&amp;#34;\2\0\0\0\230\0\0\0\1@\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., 152, 0, NULL, 0&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto&lt;span style="color:#f92672"&gt;(&lt;/span&gt;9, &lt;span style="color:#e6db74"&gt;&amp;#34;T\0\0\0\35\0\1col1\0\0\0\0\0\0\0\0\0\4\23\377\377\0\0\0\5\0\0C\0&amp;#34;&lt;/span&gt;..., 50, 0, NULL, 0&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#step9&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recvfrom&lt;span style="color:#f92672"&gt;(&lt;/span&gt;9, 0xddcf60, 8192, 0, NULL, NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; -1 EAGAIN &lt;span style="color:#f92672"&gt;(&lt;/span&gt;Resource temporarily unavailable&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;epoll_wait&lt;span style="color:#f92672"&gt;(&lt;/span&gt;4, strace: Process &lt;span style="color:#ae81ff"&gt;198337&lt;/span&gt; detached
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;detached ...&amp;gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;Receive the UNION ALL SQL from fd=9 socket&lt;/li&gt;
&lt;li&gt;&lt;code&gt;brk&lt;/code&gt; allocates memory: process memory grows from 0x34d5000 (54MB) to 0x894d6000 (2.1GB)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;lseek&lt;/code&gt; on table &lt;code&gt;lzl1&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Memory grows ~4MB&lt;/li&gt;
&lt;li&gt;&lt;code&gt;write&lt;/code&gt; to fd=2 (log file); memory grows ~48MB&lt;/li&gt;
&lt;li&gt;&lt;code&gt;lseek&lt;/code&gt; on table &lt;code&gt;lzl1&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Memory peaks at 0x8dcb1000 (2.1GB), then &lt;code&gt;brk&lt;/code&gt; releases memory back down to 0x34d5000 (54MB) — exactly matching the start&lt;/li&gt;
&lt;li&gt;Send result via socket&lt;/li&gt;
&lt;li&gt;Receive empty message from fd=9&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The strace doesn&amp;rsquo;t reveal much beyond the OS allocating and releasing memory for the process.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Memory Dump Analysis
 &lt;div id="memory-dump-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-dump-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;pmap&lt;/code&gt; of the process during the memory spike:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl pg_log&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ pmap -x &lt;span style="color:#ae81ff"&gt;76207&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;76207: postgres: postgres lzldb &lt;span style="color:#f92672"&gt;[&lt;/span&gt;local&lt;span style="color:#f92672"&gt;]&lt;/span&gt; SELECT 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Address Kbytes RSS Dirty Mode Mapping
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000400000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7984&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2192&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000000dcc000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; r---- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000000dcd000 &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; rw--- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000000ddc000 &lt;span style="color:#ae81ff"&gt;200&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000001e49000 &lt;span style="color:#ae81ff"&gt;264&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;224&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;224&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000001e8b000 &lt;span style="color:#ae81ff"&gt;1812380&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1804400&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1804400&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ffffffffff600000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------- ------- ------- ------- 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total kB &lt;span style="color:#ae81ff"&gt;2089384&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1810232&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1807384&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;pmap&lt;/code&gt; doesn&amp;rsquo;t label the segments, but we can see the largest segment starts at address 0x1e49000. Checking &lt;code&gt;smaps&lt;/code&gt; for more detail:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl 76207&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ cat smaps |grep 1e49000 -A &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;01e49000-01e8b000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;heap&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;264&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;01e8b000-70872000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;heap&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;1812380&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rss: &lt;span style="color:#ae81ff"&gt;1804400&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pss: &lt;span style="color:#ae81ff"&gt;1804400&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared_Clean: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared_Dirty: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Private_Clean: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Private_Dirty: &lt;span style="color:#ae81ff"&gt;1804400&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Referenced: &lt;span style="color:#ae81ff"&gt;1804400&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Anonymous: &lt;span style="color:#ae81ff"&gt;1804400&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;AnonHugePages: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Swap: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;KernelPageSize: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MMUPageSize: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Heap segment. PSS (private memory): 1.8GB!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;(I tried using gdb to dump the 0x1e8b000-0x70872000 segment but it failed — not sure why. Suggestions welcome!)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Using &lt;code&gt;gcore&lt;/code&gt; for a rough dump:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl lzl&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ gcore -o /pgdata/lzl/gcore.dump &lt;span style="color:#ae81ff"&gt;76207&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl lzl&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ strings gcore.dump.76207&amp;gt; text.dump.76207
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl lzl&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll -h
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw-r----- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres 2.0G Jan &lt;span style="color:#ae81ff"&gt;26&lt;/span&gt; 17:29 gcore.dump.76207
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw-r----- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres 5.2M Jan &lt;span style="color:#ae81ff"&gt;26&lt;/span&gt; 17:30 text.dump.76207&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;2GB virtual memory allocated, 1.8GB physical memory occupied — but only 5.2MB of actual data stored!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A rough &lt;code&gt;hexdump&lt;/code&gt; reveals many memory holes:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl lzl&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ hexdump -C gcore.dump.76207 |head -10000 |grep &lt;span style="color:#e6db74"&gt;&amp;#34;00 00 00 00 00 00 00 00&amp;#34;&lt;/span&gt;|wc -l
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;3690&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;log_planner_stats and Other Info
 &lt;div id="log_planner_stats-and-other-info" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#log_planner_stats-and-other-info" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;To verify whether the plan cache is the culprit, enable logging for parse, planner, and executor phases:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; log_parser_stats &lt;span style="color:#f92672"&gt;=&lt;/span&gt; on
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; log_planner_stats &lt;span style="color:#f92672"&gt;=&lt;/span&gt; on
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; log_executor_stats &lt;span style="color:#f92672"&gt;=&lt;/span&gt; on&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The logs show the parse phase uses little memory, while the planner consumes significantly more.&lt;/p&gt;
&lt;p&gt;Planner stats log:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-01-26 18:01:41.592 CST &lt;span style="color:#f92672"&gt;[&lt;/span&gt;208503&lt;span style="color:#f92672"&gt;]&lt;/span&gt; LOG: PLANNER STATISTICS
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-01-26 18:01:41.592 CST &lt;span style="color:#f92672"&gt;[&lt;/span&gt;208503&lt;span style="color:#f92672"&gt;]&lt;/span&gt; DETAIL: ! system usage stats:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ! 0.048955 s user, 0.004996 s system, 0.054077 s elapsed
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ! &lt;span style="color:#f92672"&gt;[&lt;/span&gt;11.208034 s user, 1.313838 s system total&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ! &lt;span style="color:#ae81ff"&gt;2255352&lt;/span&gt; kB max resident size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ! 0/0 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/352&lt;span style="color:#f92672"&gt;]&lt;/span&gt; filesystem blocks in/out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ! 0/1315 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/563859&lt;span style="color:#f92672"&gt;]&lt;/span&gt; page faults/reclaims, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; swaps
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ! &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; signals rcvd, 0/0 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; messages rcvd/sent
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ! 0/0 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;1/16&lt;span style="color:#f92672"&gt;]&lt;/span&gt; voluntary/involuntary context switches&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;2GB max resident size — consistent with the RES observed from the OS. This answers: &lt;em&gt;&amp;ldquo;Is the memory from parsing cache or plan cache?&amp;rdquo;&lt;/em&gt; — &lt;strong&gt;The planner phase consumes the memory.&lt;/strong&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;Inspecting TopMemoryContext
 &lt;div id="inspecting-topmemorycontext" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#inspecting-topmemorycontext" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;PostgreSQL manages backend private memory through MemoryContext. We can dump &lt;code&gt;TopMemoryContext&lt;/code&gt; via gdb:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TopMemoryContext: &lt;span style="color:#ae81ff"&gt;101488&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;48464&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;53024&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pgstat TabStatusArray lookup hash table: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;1408&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;6784&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TopTransactionContext: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;7720&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;472&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TableSpace cache: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;2048&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;6144&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RowDescriptionContext: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;6880&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;1312&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; MessageContext: &lt;span style="color:#ae81ff"&gt;1854981336&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;235&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;7911304&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;1847070032&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Grand total: &lt;span style="color:#ae81ff"&gt;1856104056&lt;/span&gt; bytes in &lt;span style="color:#ae81ff"&gt;431&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;8226712&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;179&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;1847877344&lt;/span&gt; used&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;MessageContext&lt;/strong&gt; accounts for 1.8GB — the largest consumer.&lt;/p&gt;
&lt;p&gt;From &lt;code&gt;src/backend/utils/mmgr/README&lt;/code&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;MessageContext &amp;mdash; this context holds the current command message from the frontend, as well as any derived storage that need only live as long as the current message (for example, in simple-Query mode the parse and plan trees can live here). This context will be reset, and any children deleted, at the top of each cycle of the outer loop of PostgresMain. This is kept separate from per-transaction and per-portal contexts because a query string might need to live either a longer or shorter time than any single transaction or portal.&lt;/p&gt;
&lt;/blockquote&gt;&lt;blockquote&gt;&lt;p&gt;When creating a prepared statement, the parse and plan trees will be built in a temporary context that&amp;rsquo;s a child of MessageContext.&lt;/p&gt;
&lt;/blockquote&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;MessageContext&lt;/code&gt; caches messages from the frontend, including derived parse and plan tree data.&lt;/li&gt;
&lt;li&gt;Parse and plan trees are &lt;strong&gt;children&lt;/strong&gt; of &lt;code&gt;MessageContext&lt;/code&gt; — when &lt;code&gt;MessageContext&lt;/code&gt; is reclaimed, parse and plan trees are reclaimed too.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This also explains the private memory reclamation: the plan tree data produced during the planner phase is a child of &lt;code&gt;MessageContext&lt;/code&gt;. Once results are returned, &lt;code&gt;MessageContext&lt;/code&gt; is reset and all children are freed. This matches the strace observation where memory after release matches memory before allocation exactly.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Answering the final question: &lt;em&gt;&amp;ldquo;Why did a 5MB SQL consume 70GB of memory?&amp;rdquo;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The overwhelming majority of memory was consumed during plan creation.&lt;/strong&gt; The planner allocated enormous amounts of memory. &lt;code&gt;work_mem&lt;/code&gt; and &lt;code&gt;hash_mem_multiplier&lt;/code&gt; can only constrain sort and hash operations — they cannot limit other memory operations during planning. The plan tree itself isn&amp;rsquo;t that large, but the allocation process creates massive &lt;strong&gt;memory holes&lt;/strong&gt;: megabyte-scale data (metadata, parse tree, plan tree, etc.) ends up stored in gigabyte-scale memory regions.&lt;/p&gt;
&lt;p&gt;These SQL, parse tree, and plan tree structures are all cached in &lt;code&gt;MessageContext&lt;/code&gt; and its children. Once the result is sent back to the client, all memory from this phase is reclaimed.&lt;/p&gt;</content:encoded></item><item><title>Case Study: Analyzing Occasional Slow INSERT VALUES</title><link>https://lastdba.com/en/2024/08/12/case-study-analyzing-occasional-slow-insert-values/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/case-study-analyzing-occasional-slow-insert-values/</guid><description>&lt;p&gt;The business team reported that INSERT VALUES occasionally became slow. By the time I checked the active sessions, the slow write problem had already subsided.&lt;/p&gt;
&lt;p&gt;Later, I discovered that the slow write problem lasted less than half a minute, with INSERT VALUES taking 1-2 seconds. I wrote a script to capture active session information and managed to get the session data:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; WALRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; DataFileRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; BgWriterMain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; AutoVacuumMain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ClientRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;385&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; LogicalLauncherMain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The most abnormal wait event was WALWrite with 40 sessions.&lt;/p&gt;</description><content:encoded>&lt;p&gt;The business team reported that INSERT VALUES occasionally became slow. By the time I checked the active sessions, the slow write problem had already subsided.&lt;/p&gt;
&lt;p&gt;Later, I discovered that the slow write problem lasted less than half a minute, with INSERT VALUES taking 1-2 seconds. I wrote a script to capture active session information and managed to get the session data:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; WALRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; DataFileRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; BgWriterMain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; AutoVacuumMain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ClientRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;385&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; LogicalLauncherMain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The most abnormal wait event was WALWrite with 40 sessions.&lt;/p&gt;
&lt;p&gt;Two of the WALWrite-waiting sessions looked like this:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; partofquery
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+----------+-------------------------------+-------------------------------+---------------+-----------------+--------+--------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;144955&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;516574&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;516588&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; table1( 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;179869&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;116371&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;116386&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; table1( &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s search the source code for WALWrite-related content:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; WALWriteLock: must be held to write WAL buffers to &lt;span style="color:#a6e22e"&gt;disk&lt;/span&gt; (XLogWrite or
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; XLogFlush).&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * LWLockAcquireOrWait - Acquire lock, or wait until it&amp;#39;s free
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * The semantics of this function are a bit funky. If the lock is currently
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * free, it is acquired in the given mode, and the function returns true. If
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * the lock isn&amp;#39;t immediately free, the function waits until it is released
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * and returns false, but does not acquire the lock.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * This is currently used for WALWriteLock: when a backend flushes the WAL,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * holding WALWriteLock, it can flush the commit records of many other
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * backends as a side-effect. Those other backends need to wait until the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * flush finishes, but don&amp;#39;t need to acquire the lock anymore. They can just
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * wake up, observe that their records have already been flushed, and return.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When WAL is written from WAL buffers to disk, the WALWriteLock must be held.&lt;/p&gt;
&lt;p&gt;When a backend flushes WAL while holding WALWriteLock, it can also flush the commit records of other backends. Those other backends need to wait for this flush to finish, but they don&amp;rsquo;t need to acquire the lock afterward. If their WAL has been flushed, they can return directly (rather than flushing WAL again).&lt;/p&gt;
&lt;p&gt;&lt;code&gt;XLogFlush&lt;/code&gt; is extremely important. The key code in &lt;code&gt;XLogFlush&lt;/code&gt; is in the for loop:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Ensure that all XLOG data through the given position is flushed to disk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * NOTE: this differs from XLogWrite mainly in that the WALWriteLock is not
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * already held, and we try to avoid acquiring it if possible.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;XLogFlush&lt;/span&gt;(XLogRecPtr record)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Now wait until we get the write lock, or someone else does the flush
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * for us.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		XLogRecPtr	insertpos;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* read LogwrtResult and update local state */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SpinLockAcquire&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;XLogCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;info_lck);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (WriteRqstPtr &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; XLogCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;LogwrtRqst.Write)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			WriteRqstPtr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; XLogCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;LogwrtRqst.Write;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		LogwrtResult &lt;span style="color:#f92672"&gt;=&lt;/span&gt; XLogCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;LogwrtResult;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SpinLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;XLogCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;info_lck);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* done already? */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (record &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; LogwrtResult.Flush)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Before actually performing the write, wait for all in-flight
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * insertions to the pages we&amp;#39;re about to write to finish.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		insertpos &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;WaitXLogInsertionsToFinish&lt;/span&gt;(WriteRqstPtr);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Try to get the write lock. If we can&amp;#39;t get it immediately, wait
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * until it&amp;#39;s released, and recheck if we still need to do the flush
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * or if the backend that held the lock did it for us already. This
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * helps to maintain a good rate of group committing when the system
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * is bottlenecked by the speed of fsyncing.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;LWLockAcquireOrWait&lt;/span&gt;(WALWriteLock, LW_EXCLUSIVE))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * The lock is now free, but we didn&amp;#39;t acquire it yet. Before we
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * do, loop back to check if someone else flushed the record for
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * us already.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Got the lock; recheck whether request is satisfied */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		LogwrtResult &lt;span style="color:#f92672"&gt;=&lt;/span&gt; XLogCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;LogwrtResult;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (record &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; LogwrtResult.Flush)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(WALWriteLock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Sleep before flush! By adding a delay here, we may give further
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * backends the opportunity to join the backlog of group commit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * followers; this can significantly improve transaction throughput,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * at the risk of increasing transaction latency.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * We do not sleep if enableFsync is not turned on, nor if there are
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * fewer than CommitSiblings other backends with active transactions.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (CommitDelay &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; enableFsync &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;MinimumActiveBackends&lt;/span&gt;(CommitSiblings))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;pg_usleep&lt;/span&gt;(CommitDelay);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * Re-check how far we can now flush the WAL. It&amp;#39;s generally not
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * safe to call WaitXLogInsertionsToFinish while holding
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * WALWriteLock, because an in-progress insertion might need to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * also grab WALWriteLock to make progress. But we know that all
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * the insertions up to insertpos have already finished, because
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * that&amp;#39;s what the earlier WaitXLogInsertionsToFinish() returned.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * We&amp;#39;re only calling it again to allow insertpos to be moved
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * further forward, not to actually wait for anyone.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			insertpos &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;WaitXLogInsertionsToFinish&lt;/span&gt;(insertpos);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* try to write/flush later additions to XLOG as well */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		WriteRqst.Write &lt;span style="color:#f92672"&gt;=&lt;/span&gt; insertpos;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		WriteRqst.Flush &lt;span style="color:#f92672"&gt;=&lt;/span&gt; insertpos;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;XLogWrite&lt;/span&gt;(WriteRqst, false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(WALWriteLock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* done */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;XLogFlush&lt;/code&gt; function is the main function for flushing dirty WAL:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Check if the dirty WAL that needs to be flushed has already been flushed by someone else. If so, return directly.&lt;/li&gt;
&lt;li&gt;Try to acquire the lock &lt;code&gt;WALWriteLock&lt;/code&gt; in exclusive mode, retrying continuously until the lock is acquired.&lt;/li&gt;
&lt;li&gt;Once the lock is acquired, check again if the dirty WAL that needs to be flushed has already been flushed by someone else. If so, release &lt;code&gt;WALWriteLock&lt;/code&gt; and return (during the lock acquisition wait, someone else might have flushed the dirty WAL — if so, there&amp;rsquo;s nothing to do).&lt;/li&gt;
&lt;li&gt;Wait for &lt;code&gt;commit_delay&lt;/code&gt; milliseconds, and if the number of concurrent committing transactions exceeds &lt;code&gt;commit_siblings&lt;/code&gt;, update the WAL write position to form a group commit. This step currently doesn&amp;rsquo;t apply because &lt;code&gt;CommitDelay&lt;/code&gt; defaults to 0, effectively meaning group commit is not enabled.&lt;/li&gt;
&lt;li&gt;Call &lt;code&gt;XLogWrite&lt;/code&gt; to write the log, release &lt;code&gt;WALWriteLock&lt;/code&gt; after completion.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;XLogFlush&lt;/code&gt; for flushing dirty WAL needs to check whether the currently requested dirty WAL has already been written. If not, it will hold &lt;code&gt;WALWriteLock&lt;/code&gt; until the &lt;code&gt;XLogWrite&lt;/code&gt; function completes writing the log. &lt;code&gt;XLogWrite&lt;/code&gt; is the specific function for writing WAL, such as writing to which position on which page.&lt;/p&gt;
&lt;p&gt;Returning to the wait events from active sessions, the &lt;code&gt;IO:WALWrite&lt;/code&gt; wait is relatively easy to understand, but how do we confirm whether &lt;code&gt;LWLock:WALWrite&lt;/code&gt; is a problem?&lt;/p&gt;
&lt;p&gt;From the &lt;code&gt;XLogFlush&lt;/code&gt; function logic, we know that &lt;code&gt;WALWriteLock&lt;/code&gt; is an exclusive LWLock that PostgreSQL acquires when writing dirty WAL (this makes sense — WAL commit information is written sequentially and can only be written in exclusive mode; you can&amp;rsquo;t let whoever writes fastest write first, as that could easily corrupt data). It&amp;rsquo;s a serialized write of WAL commit information.&lt;/p&gt;
&lt;p&gt;Understanding this part of the logic, looking back at &lt;code&gt;pg_stat_activity&lt;/code&gt;, we can see that there was &lt;strong&gt;only 1&lt;/strong&gt; &lt;code&gt;IO:WALWrite&lt;/code&gt;, while there were dozens of &lt;code&gt;LWLock:WALWrite&lt;/code&gt; waits.&lt;/p&gt;
&lt;p&gt;Although we can&amp;rsquo;t directly see the LWLock blocking chain, we can infer from the source code that &lt;strong&gt;LWLock:WALWrite is waiting on IO:WALWrite&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://www.postgresql.org/docs/16/wal-configuration.html" target="_blank" rel="noreferrer"&gt;official documentation&lt;/a&gt; has a section about &lt;code&gt;XLogFlush&lt;/code&gt; and adjusting WAL buffers:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Normally, WAL buffers should be written and flushed by an XLogFlush request, which is made, for the most part, at transaction commit time to ensure that transaction records are flushed to permanent storage. On systems with high WAL output, XLogFlush requests might not occur often enough to prevent XLogInsertRecord from having to do writes. On such systems one should increase the number of WAL buffers by modifying the wal_buffers parameter. When full_page_writes is set and the system is very busy, setting wal_buffers higher will help smooth response times during the period immediately following each checkpoint.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Under normal circumstances, WAL buffers are flushed by &lt;code&gt;XLogFlush&lt;/code&gt;, for example during transaction commit to write WAL logs to disk. If the WAL log volume is large but &lt;code&gt;XLogFlush&lt;/code&gt; is not triggered frequently enough (meaning mostly large transactions), &lt;code&gt;XLogInsertRecord&lt;/code&gt; needs to write uncommitted WAL records — meaning the WAL buffer is insufficient. In this case, increasing &lt;code&gt;wal_buffers&lt;/code&gt; may slightly help with system response time.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;There are two commonly used internal WAL functions: XLogInsertRecord and XLogFlush. XLogInsertRecord is used to place a new record into the WAL buffers in shared memory. If there is no space for the new record, XLogInsertRecord will have to write (move to kernel cache) a few filled WAL buffers&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Combined with a description from the &lt;code&gt;XLogInsertRecord&lt;/code&gt; function:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt; We have now done all the preparatory work we can without holding a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt; lock or modifying shared state. From here on, inserting the new WAL
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt; record to the shared WAL buffer cache is a two&lt;span style="color:#f92672"&gt;-&lt;/span&gt;step process:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1.&lt;/span&gt; Reserve the right amount of space from the WAL. The current head of
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;	 reserved space is kept in Insert&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;CurrBytePos, and is protected by
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;	 insertpos_lck.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2.&lt;/span&gt; Copy the record to the reserved WAL space. This involves finding the
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;	 correct WAL buffer containing the reserved space, and copying the
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;	 record in place. This can be done concurrently in multiple processes.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;XLogInsertRecord&lt;/code&gt; function is used to place new WAL records into the WAL buffer:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Writing requires reserving a certain amount of space.&lt;/li&gt;
&lt;li&gt;Copy the WAL record to the reserved WAL space (presumably the reserved space in the WAL buffer). &lt;strong&gt;Multiple processes can execute this in parallel.&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Copying WAL records to the WAL buffer can be done in parallel. This is unlikely to be a bottleneck since it&amp;rsquo;s an in-memory copy with parallelism.&lt;/p&gt;
&lt;p&gt;But &lt;code&gt;XLogFlush&lt;/code&gt; is different — it holds an exclusive LWLock throughout the write. So, in scenarios with high concurrency and small transactions, increasing WAL buffers theoretically won&amp;rsquo;t be very effective.&lt;/p&gt;
&lt;p&gt;At this point, we can rule out &lt;code&gt;wal_buffers&lt;/code&gt; memory tuning and focus our attention on I/O. Looking at the I/O-related wait counts in &lt;code&gt;pg_stat_activity&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DataFileRead	&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DataFileExtend	&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WALWrite		&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WALRead			&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The INSERT VALUES slowness lasted less than a minute and was not normally present. However, looking at the normal session information, I/O class WALWrite waits were almost always there:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; partofquery
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+----------+-------------------------------+-------------------------------+---------------+-----------------+--------+--------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;72668&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;828394&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;82841&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; table1( &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;77215&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;342541&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;342552&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; table1 &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;94904&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;442309&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;442323&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; table1 &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;88024&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;779086&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;779311&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; table2 &lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt; &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;103236&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;144283&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;144302&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; table1 &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;47342&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;192683&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;192699&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; table1 &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;75399&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;743023&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;743024&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; table1 &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;221993&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;46&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;184532&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;46&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;184541&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; table1 &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;However, checking the I/O performance at that time, writing 15 MB/s was not high — in fact, it was relatively low compared to other time periods, and &lt;code&gt;w_await&lt;/code&gt; was also very low:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Device: rrqm&lt;span style="color:#f92672"&gt;/&lt;/span&gt;s wrqm&lt;span style="color:#f92672"&gt;/&lt;/span&gt;s r&lt;span style="color:#f92672"&gt;/&lt;/span&gt;s w&lt;span style="color:#f92672"&gt;/&lt;/span&gt;s rkB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;s wkB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;s avgrq&lt;span style="color:#f92672"&gt;-&lt;/span&gt;sz avgqu&lt;span style="color:#f92672"&gt;-&lt;/span&gt;sz await r_await w_await svctm &lt;span style="color:#f92672"&gt;%&lt;/span&gt;util
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dm&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;322&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0.00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0.00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;187.00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1515.00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3572.00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15344.00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22.23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2.05&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1.20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;9.39&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0.18&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0.15&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;25.70&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;There was no strong evidence pointing to a storage performance issue.&lt;/p&gt;
&lt;p&gt;At present, it appears to be transient lock contention during concurrent INSERT VALUES small transactions when flushing WAL. We can rule out the following options:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Concurrent small transactions — no need to &lt;a href="https://www.postgresql.org/docs/16/wal-configuration.html" target="_blank" rel="noreferrer"&gt;adjust WAL buffers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;WAL log volume is not large — no need to enable &lt;a href="https://dba.stackexchange.com/questions/338319/postgres-walwrite-waits-whats-the-bottleneck" target="_blank" rel="noreferrer"&gt;log compression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Not many FPIs (Full Page Images) — no need to adjust checkpoint&lt;/li&gt;
&lt;li&gt;I/O pressure is not high — no need to &lt;a href="https://docs.dbmarlin.com/docs/kb/wait-events/postgresql/walwritelock/" target="_blank" rel="noreferrer"&gt;improve I/O performance&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;At minimum, the following optimizations can be made:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Enable database group commit (can be deferred if concerned about risk; testing required)&lt;/li&gt;
&lt;li&gt;Batch multiple INSERT VALUES statements at the application level to reduce WALWriteLock contention&lt;/li&gt;
&lt;/ol&gt;</content:encoded></item><item><title>Case Study: Logical Replication Deadlocks Checkpoint, Walsender, and Backup</title><link>https://lastdba.com/en/2024/08/12/case-study-logical-replication-deadlocks-checkpoint-walsender-and-backup/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/case-study-logical-replication-deadlocks-checkpoint-walsender-and-backup/</guid><description>&lt;h2 class="relative group"&gt;Problem Symptoms
 &lt;div id="problem-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The backup process (&lt;code&gt;pg_start_backup()&lt;/code&gt;) was blocked by the checkpointer, and the checkpointer was blocked by the logical replication walsender. The database was still serving queries, but backup, checkpoint, and logical replication were all completely hung.&lt;/p&gt;
&lt;p&gt;Two processes in &lt;code&gt;pg_stat_activity&lt;/code&gt; showed an unusual wait event: &lt;code&gt;replication_slot_io&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;hostlzl:&lt;span style="color:#ae81ff"&gt;6666&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;postgres][&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;]&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; pid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;173038&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;----+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;17630&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;173038&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usesysid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;35157&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; repuser
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;application_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; PostgreSQL JDBC Driver
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_addr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;88&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;75&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_hostname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_port &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;37623&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;75022&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;764475&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; replication_slot_io
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; walsender
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;658&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;hostlzl:&lt;span style="color:#ae81ff"&gt;6666&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;postgres][&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;]&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; pid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12729&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;----+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12729&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usesysid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;application_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_addr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_hostname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_port &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;343116&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; replication_slot_io
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; checkpointer&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;One walsender and one checkpointer. Both were started on April 2. Let&amp;rsquo;s check the walsender 173038 logs:&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Symptoms
 &lt;div id="problem-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The backup process (&lt;code&gt;pg_start_backup()&lt;/code&gt;) was blocked by the checkpointer, and the checkpointer was blocked by the logical replication walsender. The database was still serving queries, but backup, checkpoint, and logical replication were all completely hung.&lt;/p&gt;
&lt;p&gt;Two processes in &lt;code&gt;pg_stat_activity&lt;/code&gt; showed an unusual wait event: &lt;code&gt;replication_slot_io&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;hostlzl:&lt;span style="color:#ae81ff"&gt;6666&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;postgres][&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;]&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; pid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;173038&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;----+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;17630&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;173038&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usesysid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;35157&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; repuser
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;application_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; PostgreSQL JDBC Driver
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_addr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;88&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;75&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_hostname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_port &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;37623&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;75022&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;764475&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; replication_slot_io
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; walsender
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;658&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;hostlzl:&lt;span style="color:#ae81ff"&gt;6666&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;postgres][&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;]&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; pid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12729&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;----+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12729&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usesysid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;application_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_addr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_hostname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_port &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;343116&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; replication_slot_io
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; checkpointer&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;One walsender and one checkpointer. Both were started on April 2. Let&amp;rsquo;s check the walsender 173038 logs:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--repuser log
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:40:07.750 CST,,,173038,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:37623&amp;#34;&lt;/span&gt;,660b7e17.2a3ee,1,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,2024-04-02 11:40:07 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;connection received: host=30.88.75.58 port=37623&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:40:07.756 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,173038,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:37623&amp;#34;&lt;/span&gt;,660b7e17.2a3ee,2,&lt;span style="color:#e6db74"&gt;&amp;#34;authentication&amp;#34;&lt;/span&gt;,2024-04-02 11:40:07 CST,32/30,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;replication connection authorized: user=repuser&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:40:07.765 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,173038,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:37623&amp;#34;&lt;/span&gt;,660b7e17.2a3ee,3,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-04-02 11:40:07 CST,32/0,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;starting logical decoding for slot &amp;#34;&amp;#34;pg_lzldb_lzldb_ora_pgdb_pgdb&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;Streaming transactions committing after 4263/42E6EF88, reading WAL from 4263/41DAEBD0.&amp;#34;&lt;/span&gt;,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;PostgreSQL JDBC Driver&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:40:07.765 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,173038,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:37623&amp;#34;&lt;/span&gt;,660b7e17.2a3ee,4,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-04-02 11:40:07 CST,32/0,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;logical decoding found consistent point at 4263/41DAEBD0&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;There are no running transactions.&amp;#34;&lt;/span&gt;,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;PostgreSQL JDBC Driver&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Walsender 173038 only shows startup information. After that, no more log output — it likely hung from the very start.&lt;/p&gt;
&lt;p&gt;Scrolling back a bit, we can find an earlier walsender for the same replication slot (different PID, same slot name):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--84918 earlier startup logs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:30:07.498 CST,,,84918,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:54898&amp;#34;&lt;/span&gt;,660b7bbf.14bb6,1,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,2024-04-02 11:30:07 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;connection received: host=30.88.75.58 port=54898&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:30:07.504 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,84918,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:54898&amp;#34;&lt;/span&gt;,660b7bbf.14bb6,2,&lt;span style="color:#e6db74"&gt;&amp;#34;authentication&amp;#34;&lt;/span&gt;,2024-04-02 11:30:07 CST,30/3,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;replication connection authorized: user=repuser&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:30:07.514 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,84918,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:54898&amp;#34;&lt;/span&gt;,660b7bbf.14bb6,3,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-04-02 11:30:07 CST,30/0,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;starting logical decoding for slot &amp;#34;&amp;#34;pg_lzldb_lzldb_ora_pgdb_pgdb&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;Streaming transactions committing after 4263/41DADE38, reading WAL from 4263/358F1340.&amp;#34;&lt;/span&gt;,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;PostgreSQL JDBC Driver&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:30:07.516 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,84918,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:54898&amp;#34;&lt;/span&gt;,660b7bbf.14bb6,4,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-04-02 11:30:07 CST,30/0,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;logical decoding found consistent point at 4263/358F1340&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;There are no running transactions.&amp;#34;&lt;/span&gt;,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;PostgreSQL JDBC Driver&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:36:07.061 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,86630,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:45227&amp;#34;&lt;/span&gt;,660b7bca.15266,5,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-04-02 11:30:18 CST,30/0,0,ERROR,XX000,&lt;span style="color:#e6db74"&gt;&amp;#34;could not write to file &amp;#34;&amp;#34;pg_replslot/pg_lzldb_lzldb_ora_pgdb_pgdb/state.tmp&amp;#34;&amp;#34;: Cannot allocate memory&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;PostgreSQL JDBC Driver&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:36:40.151 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,86630,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:45227&amp;#34;&lt;/span&gt;,660b7bca.15266,6,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-04-02 11:30:18 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;disconnection: session time: 0:06:21.760 user=repuser database=lzldb host=30.88.75.58 port=45227&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;PostgreSQL JDBC Driver&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This replication slot was also started at 11:30:07. Six minutes later, it failed to write &lt;code&gt;state.tmp&lt;/code&gt; due to memory exhaustion.&lt;/p&gt;
&lt;p&gt;The checkpointer process 12729 also reported the same &lt;code&gt;state.tmp&lt;/code&gt; error — &lt;code&gt;&amp;quot;pg_replslot/pg_lzldb_lzldb_ora_pgdb_pgdb/state.tmp&amp;quot;&amp;quot;: File exists&amp;quot;&lt;/code&gt;. This error appeared ~30 seconds after the replication slot error:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--checkpoint log
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:36:39.925 CST,,,12729,,660b7a17.31b9,4,,2024-04-02 11:23:03 CST,,0,LOG,58P02,&lt;span style="color:#e6db74"&gt;&amp;#34;could not create file &amp;#34;&amp;#34;pg_replslot/pg_lzldb_lzldb_ora_pgdb_pgdb/state.tmp&amp;#34;&amp;#34;: File exists&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:36:40.151 CST,,,12729,,660b7a17.31b9,5,,2024-04-02 11:23:03 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;checkpoint complete: wrote 334 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.108 s, sync=0.082 s, total=217.083 s; sync files=139, longest=0.004 s, average=0.000 s; distance=2295 kB, estimate=2295 kB&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:48:03.414 CST,,,12729,,660b7a17.31b9,6,,2024-04-02 11:23:03 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;checkpoint starting: time&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After this, the checkpointer produced no more log output — it hung, just like the walsender.&lt;/p&gt;
&lt;p&gt;Searching for &lt;code&gt;pg_replslot/pg_lzldb_lzldb_ora_pgdb_pgdb/state.tmp&amp;quot;&amp;quot;: File exists&amp;quot;&lt;/code&gt; quickly leads to a community thread: &lt;a href="https://www.postgresql.org/message-id/14b3454f-2d68-c637-68e4-2b42ff976168%40postgrespro.ru" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/message-id/14b3454f-2d68-c637-68e4-2b42ff976168%40postgrespro.ru&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The actual fix landed in &lt;a href="https://www.postgresql.org/docs/release/12.3/" target="_blank" rel="noreferrer"&gt;PG 12.3&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Ensure that a replication slot&amp;rsquo;s io_in_progress_lock is released in failure code paths (Pavan Deolasee)
This could result in a walsender later becoming stuck waiting for the lock.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 class="relative group"&gt;Deep Dive
 &lt;div id="deep-dive" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#deep-dive" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;We found the bug, but several questions remain:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why did the walsender and checkpointer hang?&lt;/li&gt;
&lt;li&gt;Who is blocking whom — the walsender or the checkpointer?&lt;/li&gt;
&lt;li&gt;How was this triggered?&lt;/li&gt;
&lt;li&gt;What are the solutions?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Source Code Analysis
 &lt;div id="source-code-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#source-code-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Current version: 11.5.&lt;/p&gt;
&lt;p&gt;Pstack of both processes:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@hostlzl:lzldb:6666: /pg/pg6666/data/pg_log&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ pstack &lt;span style="color:#ae81ff"&gt;173038&lt;/span&gt; &lt;span style="color:#75715e"&gt;##walsender&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 0x00002b9eec171a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x00002b9eec171a9f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x00002b9eec171b3b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x00000000006b2512 in PGSemaphoreLock (sema=0x2b9ef5fdb0b8) at pg_sema.c:316&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x000000000071e94c in LWLockAcquire (lock=lock@entry=0x2babd8cee5b8, mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:1243&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#5 0x00000000006ef7cb in SaveSlotToPath (slot=0x2babd8cee500, dir=dir@entry=0x7ffcaffd79f0 &amp;#34;pg_replslot/pg_lzldb_lzldb_ora_pgdb_pgdb&amp;#34;, elevel=elevel@entry=20) at slot.c:1249&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#6 0x00000000006f0515 in ReplicationSlotSave () at slot.c:653&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#7 0x00000000006d75d8 in LogicalConfirmReceivedLocation (lsn=&amp;lt;optimized out&amp;gt;) at logical.c:1049&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#8 0x00000000006d774d in LogicalIncreaseXminForSlot (current_lsn=current_lsn@entry=72994075200640, xmin=xmin@entry=1241611955) at logical.c:914&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#9 0x00000000006e0fb3 in SnapBuildProcessRunningXacts (builder=builder@entry=0x23146c0, lsn=72994075200640, running=running@entry=0x22e8978) at snapbuild.c:1146&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#10 0x00000000006d484c in DecodeStandbyOp (buf=0x7ffcaffd7eb0, buf=0x7ffcaffd7eb0, ctx=0x2209820) at decode.c:318&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#11 LogicalDecodingProcessRecord (ctx=0x2209820, record=&amp;lt;optimized out&amp;gt;) at decode.c:121&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#12 0x00000000006e50e0 in XLogSendLogical () at walsender.c:2799&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#13 0x00000000006e7122 in WalSndLoop (send_data=send_data@entry=0x6e5080 &amp;lt;XLogSendLogical&amp;gt;) at walsender.c:2162&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#14 0x00000000006e7d91 in StartLogicalReplication (cmd=0x22eedd8) at walsender.c:1109&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#15 exec_replication_command (cmd_string=cmd_string@entry=0x2210c48 &amp;#34;START_REPLICATION SLOT pg_lzldb_lzldb_ora_pgdb_pgdb LOGICAL 4263/42E6EF88 (\&amp;#34;add-tables\&amp;#34; &amp;#39;public.acr_finance_coa_partition_17_01,public.acr_finance_coa_partition_17_02,public.acr_finance_coa_part&amp;#34;...) at walsender.c:1541&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#16 0x000000000072c899 in PostgresMain (argc=&amp;lt;optimized out&amp;gt;, argv=argv@entry=0x2216f78, dbname=0x2216c98 &amp;#34;lzldb&amp;#34;, username=&amp;lt;optimized out&amp;gt;) at postgres.c:4178&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#17 0x000000000047e481 in BackendRun (port=0x20eda0) at postmaster.c:4358&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#18 BackendStartup (port=0x20eda0) at postmaster.c:4030&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#19 ServerLoop () at postmaster.c:1707&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#20 0x00000000006c4359 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x21dbe90) at postmaster.c:1380&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#21 0x000000000047eefb in main (argc=3, argv=0x21dbe90) at main.c:228&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@hostlzl:lzldb:6666: /pg/pg6666/data/pg_wal&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ pstack &lt;span style="color:#ae81ff"&gt;12729&lt;/span&gt; &lt;span style="color:#75715e"&gt;##checkpointer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 0x00002b9eec171a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x00002b9eec171a9f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x00002b9eec171b3b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x00000000006b2512 in PGSemaphoreLock (sema=0x2b9ef5fdcd38) at pg_sema.c:316&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x000000000071e94c in LWLockAcquire (lock=lock@entry=0x2babd8cee5b8, mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:1243&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#5 0x00000000006ef7cb in SaveSlotToPath (slot=slot@entry=0x2babd8cee500, dir=dir@entry=0x7ffcaffd6ee0 &amp;#34;pg_replslot/pg_lzldb_lzldb_ora_pgdb_pgdb&amp;#34;, elevel=elevel@entry=15) at slot.c:1249&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#6 0x00000000006f11a7 in CheckPointReplicationSlots () at slot.c:1100&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#7 0x00000000004f674f in CheckPointGuts (checkPointRedo=72994093982360, flags=flags@entry=128) at xlog.c:9146&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#8 0x00000000004fcc77 in CreateCheckPoint (flags=flags@entry=128) at xlog.c:8937&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#9 0x00000000006b8312 in CheckpointerMain () at checkpointer.c:491&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#10 0x000000000050ba15 in AuxiliaryProcessMain (argc=argc@entry=2, argv=argv@entry=0x7ffcaffd7540) at bootstrap.c:451&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#11 0x00000000006c1cb9 in StartChildProcess (type=CheckpointerProcess) at postmaster.c:5337&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#12 0x00000000006c2f5a in reaper (postgres_signal_arg=&amp;lt;optimized out&amp;gt;) at postmaster.c:2867&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#13 &amp;lt;signal handler called&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#14 0x00002b9eed5ba783 in __select_nocancel () from /lib64/libc.so.6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#15 0x000000000047db38 in ServerLoop () at postmaster.c:1671&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#16 0x00000000006c4359 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x21dbe90) at postmaster.c:1380&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#17 0x000000000047eefb in main (argc=3, argv=0x21dbe90) at main.c:228&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The key observation is the &lt;code&gt;LWLockAcquire&lt;/code&gt; frame. Both the walsender and the checkpointer are trying to acquire the &lt;strong&gt;same LWLOCK address in exclusive mode&lt;/strong&gt;: &lt;code&gt;lock=lock@entry=0x2babd8cee5b8, mode=mode@entry=LW_EXCLUSIVE&lt;/code&gt; — waiting indefinitely.&lt;/p&gt;
&lt;p&gt;The function right above &lt;code&gt;LWLockAcquire&lt;/code&gt; is &lt;code&gt;SaveSlotToPath&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Looking at the source in &lt;code&gt;src/backend/replication/slot.c&lt;/code&gt;, the critical function &lt;code&gt;SaveSlotToPath&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;//SaveSlotToPath stores slot state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SaveSlotToPath&lt;/span&gt;(ReplicationSlot &lt;span style="color:#f92672"&gt;*&lt;/span&gt;slot, &lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;dir, &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; elevel)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{	&lt;span style="color:#75715e"&gt;//11.5 code
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;		tmppath[MAXPGPATH];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;		path[MAXPGPATH];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			fd;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ReplicationSlotOnDisk cp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		was_dirty;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* and don&amp;#39;t do anything if there&amp;#39;s nothing to write */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;was_dirty)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;//Acquire LWLock in exclusive mode at function entry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LWLockAcquire&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;io_in_progress_lock, LW_EXCLUSIVE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;//Note the fd logic — the error matches the second walsender error
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	fd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;OpenTransientFile&lt;/span&gt;(tmppath, O_CREAT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; O_EXCL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; O_WRONLY &lt;span style="color:#f92672"&gt;|&lt;/span&gt; PG_BINARY);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (fd &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(elevel,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(&lt;span style="color:#a6e22e"&gt;errcode_for_file_access&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not create file &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;: %m&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						tmppath)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;//The logic for writing to fd — the error matches the first walsender error
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; ((&lt;span style="color:#a6e22e"&gt;write&lt;/span&gt;(fd, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;cp, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(cp))) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(cp))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			save_errno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; errno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;pgstat_report_wait_end&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;CloseTransientFile&lt;/span&gt;(fd);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* if write didn&amp;#39;t set errno, assume problem is no disk space */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		errno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; save_errno &lt;span style="color:#f92672"&gt;?&lt;/span&gt; save_errno : ENOSPC;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(elevel,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(&lt;span style="color:#a6e22e"&gt;errcode_for_file_access&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not write to file &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;: %m&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						tmppath)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;io_in_progress_lock);	&lt;span style="color:#75715e"&gt;//Release LWLock at end of function
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;SaveSlotToPath&lt;/code&gt; acquires &lt;code&gt;LWLockAcquire&lt;/code&gt; on the slot&amp;rsquo;s &lt;code&gt;io_in_progress_lock&lt;/code&gt; in &lt;code&gt;LW_EXCLUSIVE&lt;/code&gt; mode — very similar to the wait event name: &lt;code&gt;io_in_progress_lock&lt;/code&gt; ↔ &lt;code&gt;replication_slot_io&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;At the end of the function, &lt;code&gt;LWLockRelease&lt;/code&gt; releases the lock.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;But in both &lt;code&gt;if&lt;/code&gt; branches, there is no &lt;code&gt;LWLockRelease&lt;/code&gt; — the function just returns directly!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The PostgreSQL log shows &amp;ldquo;could not create file&amp;rdquo; for &lt;code&gt;tmppath&lt;/code&gt;, meaning the code hit one of those two &lt;code&gt;if&lt;/code&gt; branches — either the &lt;strong&gt;write to state.tmp failed&lt;/strong&gt; branch or the &lt;strong&gt;create state.tmp failed&lt;/strong&gt; branch.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s reconstruct the timeline from the logs:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;11:36:07&lt;/strong&gt;: Logical replication first error — &amp;ldquo;could not write to file &amp;hellip; state.tmp&amp;rdquo;. Replication link dies.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;11:36:39&lt;/strong&gt;: Checkpointer error — &amp;ldquo;could not create file &amp;hellip; state.tmp&amp;rdquo;. One second later, checkpoint &amp;ldquo;completes&amp;rdquo; with 0 dirty buffers, 0 WAL.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;11:40:07&lt;/strong&gt;: Logical replication starts again. No more output.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;11:48:03&lt;/strong&gt;: Checkpointer triggers &lt;code&gt;start&lt;/code&gt; again. No more output.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Important: the first and second logical replication connections belong to &lt;strong&gt;different&lt;/strong&gt; walsender PIDs; the first and second checkpoint entries belong to the &lt;strong&gt;same&lt;/strong&gt; checkpointer PID.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fault mechanism reconstructed:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Logical replication walsender, due to memory pressure, fails to write &lt;code&gt;state.tmp&lt;/code&gt;, leaving a residual &lt;code&gt;state.tmp&lt;/code&gt; file behind.&lt;/li&gt;
&lt;li&gt;The checkpointer, encountering the residual &lt;code&gt;state.tmp&lt;/code&gt;, enters the &lt;code&gt;if (fd &amp;lt; 0)&lt;/code&gt; branch in &lt;code&gt;SaveSlotToPath&lt;/code&gt; after acquiring the LWLock in exclusive mode — and returns &lt;strong&gt;without releasing the LWLock&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;A new walsender starts for logical replication and tries to acquire the LWLock at the top of &lt;code&gt;SaveSlotToPath&lt;/code&gt; — waits indefinitely.&lt;/li&gt;
&lt;li&gt;The checkpointer triggers a new checkpoint and also tries to acquire the LWLock at the top of &lt;code&gt;SaveSlotToPath&lt;/code&gt; — waits indefinitely.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With the mechanism clear, the answers follow:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Why did the walsender and checkpointer hang?&lt;/strong&gt; Residual &lt;code&gt;state.tmp&lt;/code&gt;. The checkpointer held the LWLock without releasing it. Both walsender and checkpointer wait indefinitely.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Who blocks whom?&lt;/strong&gt; The checkpointer blocks the walsender.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How was it triggered?&lt;/strong&gt; The previous walsender exhausted memory, leaving an uncleaned &lt;code&gt;state.tmp&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Solutions?&lt;/strong&gt; Force restart the database.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Reproduction
 &lt;div id="reproduction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reproduction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;For background on PostgreSQL logical replication, refer to: &lt;a href="https://blog.csdn.net/qq_40687433/article/details/129291207" target="_blank" rel="noreferrer"&gt;PG inner workings: Logical Replication&lt;/a&gt;. Key commands:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_create_logical_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_test&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;test_decoding&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_recvlogical &lt;span style="color:#f92672"&gt;-&lt;/span&gt;h &lt;span style="color:#ae81ff"&gt;127&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;p &lt;span style="color:#ae81ff"&gt;5558&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;d lzldb &lt;span style="color:#f92672"&gt;-&lt;/span&gt;U lzl &lt;span style="color:#75715e"&gt;--slot=logical_test --start -f recv.sql &amp;amp;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The slot and replication link are ready:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pid,usename,xact_start,state_change,wait_event,&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt;,query &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;idle&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; xact_start ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; query 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+----------+-------------------------------+-------------------------------+---------------------+--------+----------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;59916&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;015534&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;015545&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pid,usename,xact_start,state_change,wait_event,&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt;,query &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity wher
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;e &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;idle&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; xact_start ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;59791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;566112&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WalSenderWaitForWAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; pg_catalog.set_config(&lt;span style="color:#e6db74"&gt;&amp;#39;search_path&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;false&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pid,usename,application_name,backend_start,&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt;,pg_walfile_name_offset(sent_lsn) sentoffset,pg_walfile_name_offset(write_lsn) writeoffset,pg_walfile_name_offset(flush_lsn) flushoffset &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_replication;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; application_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; backend_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; sentoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; writeoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; flushoffset 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+---------+------------------+------------------------------+-----------+------------------------------------+------------------------------------+------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;59791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_recvlogical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;56364&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; streaming &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;000000010000000000000001&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6612032&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;000000010000000000000001&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6612032&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;000000010000000000000001&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6612032&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since the problem is caused by &lt;code&gt;state.tmp&lt;/code&gt;, just &lt;code&gt;touch&lt;/code&gt; it under &lt;code&gt;pg_replslot&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;testhost logical_test]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pwd
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pgdata&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzl&lt;span style="color:#f92672"&gt;/&lt;/span&gt;data11&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg_replslot&lt;span style="color:#f92672"&gt;/&lt;/span&gt;logical_test&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;pg_recvlogical&lt;/code&gt; immediately errors:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_recvlogical: unexpected termination &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; replication stream: ERROR: could &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; file &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot/logical_test/state.tmp&amp;#34;&lt;/span&gt;: File &lt;span style="color:#66d9ef"&gt;exists&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Manual &lt;code&gt;CHECKPOINT&lt;/code&gt; hangs:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;checkpoint&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--hang&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now check the walsender and session states:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; datid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usesysid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; application_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; client_addr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; client_hostname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; client_port &lt;span style="color:#f92672"&gt;|&lt;/span&gt; backend_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; query_start 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; backend_xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; backend_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; query &lt;span style="color:#f92672"&gt;|&lt;/span&gt; backend_type 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-+&lt;/span&gt;&lt;span style="color:#75715e"&gt;-------------------------------+-----------------+---------------------+--------+-------------+--------------+--------------------------------------------------------+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;... 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Activity &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LogicalLauncherMain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; logical replication launcher
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;058523&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;checkpoint&lt;/span&gt;; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; client backend
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;77638&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16385&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_recvlogical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;127&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;56928&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;495833&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;497754&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;498329&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; replication_slot_io &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; pg_catalog.set_config(&lt;span style="color:#e6db74"&gt;&amp;#39;search_path&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;false&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; walsender
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; replication_slot_io &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; checkpointer&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Perfectly reproduced — two &lt;code&gt;replication_slot_io&lt;/code&gt; wait events.&lt;/p&gt;

&lt;h3 class="relative group"&gt;PG 12.3 Code Fix
 &lt;div id="pg-123-code-fix" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg-123-code-fix" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;//Here showing 15.3, which has an extra save_errno vs 12.3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SaveSlotToPath&lt;/span&gt;(ReplicationSlot &lt;span style="color:#f92672"&gt;*&lt;/span&gt;slot, &lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;dir, &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; elevel)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	fd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;OpenTransientFile&lt;/span&gt;(tmppath, O_CREAT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; O_EXCL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; O_WRONLY &lt;span style="color:#f92672"&gt;|&lt;/span&gt; PG_BINARY);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (fd &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * If not an ERROR, then release the lock before returning. In case
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * of an ERROR, the error recovery path automatically releases the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * lock, but no harm in explicitly releasing even in that case. Note
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * that LWLockRelease() could affect errno.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			save_errno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; errno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;io_in_progress_lock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		errno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; save_errno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(elevel,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(&lt;span style="color:#a6e22e"&gt;errcode_for_file_access&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not create file &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;: %m&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						tmppath)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;io_in_progress_lock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}	
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In &lt;strong&gt;every &lt;code&gt;if&lt;/code&gt; branch&lt;/strong&gt;, &lt;code&gt;LWLockRelease&lt;/code&gt; is called before returning. This eliminates the logical vulnerability where the LWLock is not released in certain code paths. The code is clearly more robust.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Solution Analysis
 &lt;div id="solution-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#solution-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Deleting &lt;code&gt;state.tmp&lt;/code&gt; won&amp;rsquo;t help — the LWLock is already held; the file was just the trigger.&lt;/li&gt;
&lt;li&gt;Restarting the replication link or killing the downstream won&amp;rsquo;t help — the checkpointer is the one holding the LWLock.&lt;/li&gt;
&lt;li&gt;The checkpointer cannot be killed directly. The only solution in this state is a &lt;strong&gt;force restart&lt;/strong&gt; to perform instance recovery. A normal shutdown is impossible because &lt;code&gt;CHECKPOINT&lt;/code&gt; is blocked.&lt;/li&gt;
&lt;li&gt;The ultimate fix: upgrade to PG 12.3 or later.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;(I also tried using gdb to call &lt;code&gt;LWLockRelease&lt;/code&gt; with the LWLock address from pstack — it crashed the test instance immediately. Not recommended.)&lt;/em&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Logical replication is one of the most significant feature enhancements in recent PostgreSQL releases. Early versions did have many issues and pitfalls. PostgreSQL&amp;rsquo;s &lt;a href="https://blog.csdn.net/qq_40687433/article/details/136405862?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;ambitious logical replication approach&lt;/a&gt; shows genuine innovation, and the community continuously refines and strengthens it — nearly every minor release includes many logical replication updates. This case is a real-world example: the logical replication code is clearly becoming more robust.&lt;/p&gt;
&lt;p&gt;Logical replication has a lot of depth. Recommended reading: &lt;a href="https://blog.csdn.net/qq_40687433/article/details/129291207" target="_blank" rel="noreferrer"&gt;PG Inner Workings: Logical Replication&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>Case Study: Predicate Out-of-Bounds and Prepared Statement Issues in PostgreSQL</title><link>https://lastdba.com/en/2024/08/12/case-study-predicate-out-of-bounds-and-prepared-statement-issues-in-postgresql/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/case-study-predicate-out-of-bounds-and-prepared-statement-issues-in-postgresql/</guid><description>&lt;h2 class="relative group"&gt;The Phenomenon
 &lt;div id="the-phenomenon" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-phenomenon" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Case: The execution plan changed and chose the wrong index, causing SQL performance to degrade from milliseconds to seconds. After collecting statistics, the business SQL was still slow. Ultimately, the problem was resolved by dropping the &lt;code&gt;DAILY_DATE&lt;/code&gt; time index and creating a composite index on &lt;code&gt;(DAILY_DATE, A_ID)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Questions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Why did the optimizer choose the &lt;code&gt;DAILY_DATE&lt;/code&gt; index instead of the more selective &lt;code&gt;A_ID&lt;/code&gt; index?&lt;/li&gt;
&lt;li&gt;Why did collecting statistics have no effect?&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;Stale Statistics
 &lt;div id="stale-statistics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#stale-statistics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Simplified SQL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tablzl
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; A_ID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETE &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;N&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DAILY_DATE &lt;span style="color:#f92672"&gt;=&lt;/span&gt; to_date(&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;yyyyMMdd&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; PARTITION_KEY &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; PARTITION_KEY &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The optimizer chose the &lt;code&gt;DAILY_DATE&lt;/code&gt; index instead of the more selective &lt;code&gt;A_ID&lt;/code&gt; index:&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;The Phenomenon
 &lt;div id="the-phenomenon" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-phenomenon" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Case: The execution plan changed and chose the wrong index, causing SQL performance to degrade from milliseconds to seconds. After collecting statistics, the business SQL was still slow. Ultimately, the problem was resolved by dropping the &lt;code&gt;DAILY_DATE&lt;/code&gt; time index and creating a composite index on &lt;code&gt;(DAILY_DATE, A_ID)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Questions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Why did the optimizer choose the &lt;code&gt;DAILY_DATE&lt;/code&gt; index instead of the more selective &lt;code&gt;A_ID&lt;/code&gt; index?&lt;/li&gt;
&lt;li&gt;Why did collecting statistics have no effect?&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;Stale Statistics
 &lt;div id="stale-statistics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#stale-statistics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Simplified SQL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tablzl
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; A_ID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETE &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;N&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DAILY_DATE &lt;span style="color:#f92672"&gt;=&lt;/span&gt; to_date(&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;yyyyMMdd&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; PARTITION_KEY &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; PARTITION_KEY &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The optimizer chose the &lt;code&gt;DAILY_DATE&lt;/code&gt; index instead of the more selective &lt;code&gt;A_ID&lt;/code&gt; index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;83&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;204&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; tablzl_p202401_DAILY_DATE_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tablzl_p202401 tablzl_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;203&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (DAILY_DATE &lt;span style="color:#f92672"&gt;=&lt;/span&gt; to_date(&lt;span style="color:#e6db74"&gt;&amp;#39;20240223&amp;#39;&lt;/span&gt;::text, &lt;span style="color:#e6db74"&gt;&amp;#39;yyyyMMdd&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((partition_key &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;202401&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (partition_key &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;202402&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((A_ID)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;ID1234567890987654321&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_delete)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;N&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; tablzl_p202402_DAILY_DATE_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tablzl_p202402 tablzl_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;204&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (DAILY_DATE &lt;span style="color:#f92672"&gt;=&lt;/span&gt; to_date(&lt;span style="color:#e6db74"&gt;&amp;#39;20240223&amp;#39;&lt;/span&gt;::text, &lt;span style="color:#e6db74"&gt;&amp;#39;yyyyMMdd&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((partition_key &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;202401&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (partition_key &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;202402&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((A_ID)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;ID1234567890987654321&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_delete)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;N&amp;#39;&lt;/span&gt;::text))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;For the &lt;code&gt;p202401&lt;/code&gt; partition, whether it uses the &lt;code&gt;DAILY_DATE&lt;/code&gt; or &lt;code&gt;A_ID&lt;/code&gt; index doesn&amp;rsquo;t make much difference, because the January partition has no data for February 23.&lt;/li&gt;
&lt;li&gt;For the &lt;code&gt;p202402&lt;/code&gt; partition, whether it uses the &lt;code&gt;DAILY_DATE&lt;/code&gt; or &lt;code&gt;A_ID&lt;/code&gt; index makes a huge difference. Using the &lt;code&gt;DAILY_DATE&lt;/code&gt; index, its estimated cost is 3.35 with rows=1, but in reality there are millions of rows, causing it to run for 2 seconds.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The statistics for &lt;code&gt;p202402&lt;/code&gt; contain MCV (Most Common Values):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stats &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; tablename&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;tablzl_p202402&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; attname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;DAILY_DATE&amp;#39;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_vals &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;06&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_freqs &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0481&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;047766667&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0466&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0449&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0441&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043833334&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043733332&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043466665&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043133333&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043066666&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;042366665&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041866668&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041366667&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041366667&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;039766666&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0394&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;039333332&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;..
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;038766667&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;03863333&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0381&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;038066667&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;037966665&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;037566666&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;036733333&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Calculate the sum of MCV frequencies:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0481&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;047766667&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0466&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0449&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0441&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043833334&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043733332&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043466665&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043133333&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043066666&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;042366665&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041866668&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041366667&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041366667&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;039766666&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0394&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;039333332&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;038766667&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;03863333&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0381&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;038066667&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;037966665&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;037566666&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;036733333&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;?&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;column&lt;/span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;999999990&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It&amp;rsquo;s exactly 1, meaning the planner estimates that days 1-22 represent all the data in this partition, and day 23 should have 0 rows. So when estimating rows for day 23 data, the planner assumes rows=1, and thus chooses the &lt;code&gt;DAILY_DATE&lt;/code&gt; index. In reality, day 23 had millions of rows.&lt;/p&gt;
&lt;p&gt;Essentially, this is a problem caused by stale statistics. Why were the first 22 days fine, and why didn&amp;rsquo;t day 23 trigger automatic collection?&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,reloptions &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;tablzl&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; reloptions 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------+------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; autovacuum_analyze_scale_factor;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; autovacuum_analyze_scale_factor 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The trigger threshold defaults to 0.1 — auto-ANALYZE only triggers when data changes reach 1/10. This is a monthly partition, with data inserted and updated daily. Early in the month, writing 2 million rows per day would trigger multiple ANALYZEs (the threshold of 50 can be ignored), but at month end, for example on day 23, writing 2 million rows would not trigger ANALYZE because only 1/23 of the data changed. In this scenario, data was also updated after insertion — 2 million inserts and 2 million updates — so the data change on day 23 was about 1/11, just barely not triggering ANALYZE. &lt;strong&gt;This also explains why the first 20 days ran stably.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Additionally, since the data change threshold is a ratio, as long as the daily data change volume is relatively uniform, this month-end statistics inaccuracy problem will always occur!&lt;/p&gt;

&lt;h2 class="relative group"&gt;Execution Plan Caching
 &lt;div id="execution-plan-caching" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#execution-plan-caching" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Since this was a stale statistics problem, manually collecting statistics should have resolved it. In practice, however, after collection, the business SQL was still slow.&lt;/p&gt;
&lt;p&gt;After running ANALYZE, manual &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; showed the correct execution plan.&lt;/p&gt;
&lt;p&gt;This indicated that ANALYZE should have helped, but it didn&amp;rsquo;t affect the business sessions. Since the SQL execution used long-lived sessions, I suspected that the JDBC driver was using prepared statements to cache execution plans (&lt;a href="https://jenkov.com/tutorials/jdbc/preparedstatement.html#:~:text=JDBC%20PreparedStatement%201%20Creating%20a%20PreparedStatement%20Before%20you,Reusing%20a%20PreparedStatement%20...%205%20PreparedStatement%20Performance%20" target="_blank" rel="noreferrer"&gt;JDBC PreparedStatement&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;In PostgreSQL 13 (RasesQL 1.3), collecting statistics does not invalidate prepared statements; re-parsing only happens by reconnecting the session.&lt;/p&gt;
&lt;p&gt;Prepared statements generate a generic execution plan. Due to inaccurate statistics, the generic execution plan, like the parameter-specific execution plan, could choose the wrong index.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Characteristics of Prepared Statements
 &lt;div id="characteristics-of-prepared-statements" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#characteristics-of-prepared-statements" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;psql&lt;/code&gt; supports prepared statements, controlled by the &lt;code&gt;plan_cache_mode&lt;/code&gt; parameter:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;auto&lt;/code&gt;: default, uses the five-execution mechanism&lt;/li&gt;
&lt;li&gt;&lt;code&gt;force_custom_plan&lt;/code&gt;: always performs hard parsing, generating a custom plan&lt;/li&gt;
&lt;li&gt;&lt;code&gt;force_generic_plan&lt;/code&gt;: always uses the generic plan with bound variables&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Syntax:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; plan1(text,integer) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tlzl1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;month&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;11&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;deallocate&lt;/span&gt; plan1&lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;all&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- invalidates the prepared statement; disconnecting also works&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View: (basically useless since it&amp;rsquo;s local — you can&amp;rsquo;t see anything in production)&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_prepared_statements;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;How Generic Plans Are Generated
 &lt;div id="how-generic-plans-are-generated" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-generic-plans-are-generated" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Normally, a prepared statement can generate a generic plan after running 5 times. There are many demonstrations online, so I won&amp;rsquo;t demonstrate the normal case here. Below are the &amp;ldquo;magical&amp;rdquo; phenomena I observed during testing:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Prepare data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tlzl1(id varchar(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;),&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; int);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; tlzl1 &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; md5(&lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;::text),&lt;span style="color:#66d9ef"&gt;EXTRACT&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; generate_series(&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01&amp;#39;&lt;/span&gt;::date, &lt;span style="color:#e6db74"&gt;&amp;#39;2023-11-30&amp;#39;&lt;/span&gt;::date, &lt;span style="color:#e6db74"&gt;&amp;#39;1 minute&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_id &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1(id);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_month &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1(&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; tlzl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Execute prepared statement
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; plan1(text,integer) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tlzl1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;month&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;11&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;11&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note that only data before December was inserted — December has no data. At this point, querying December data can use the &lt;code&gt;month&lt;/code&gt; index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;12&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_month &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;94&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;035&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;036&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;170&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;058&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;551&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;12&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_month &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;94&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;021&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;021&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;168&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;046&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;488&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;12&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_month &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;94&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;017&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;018&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;157&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;040&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;419&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;12&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_month &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;94&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;019&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;020&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;160&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;044&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;479&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;12&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_month &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;94&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;018&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;018&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;426&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Sixth execution
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;12&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_id &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;044&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;045&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; Removed &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; Filter: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;023&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;079&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;On the sixth execution, the generic plan was bound — but it wasn&amp;rsquo;t the same plan as the first five executions; it used the &lt;code&gt;id&lt;/code&gt; index. If &lt;code&gt;id&lt;/code&gt; had even higher cardinality, you could also observe cases where the generic plan simply couldn&amp;rsquo;t be bound (not shown here).&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s look at the source code:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;choose_custom_plan&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;choose_custom_plan&lt;/span&gt;(CachedPlanSource &lt;span style="color:#f92672"&gt;*&lt;/span&gt;plansource, ParamListInfo boundParams)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Generate custom plans until we have done at least 5 (arbitrary) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;num_custom_plans &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	avg_custom_cost &lt;span style="color:#f92672"&gt;=&lt;/span&gt; plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;total_custom_cost &lt;span style="color:#f92672"&gt;/&lt;/span&gt; plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;num_custom_plans;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Prefer generic plan if it&amp;#39;s less expensive than the average custom
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * plan. (Because we include a charge for cost of planning in the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * custom-plan costs, this means the generic plan only has to be less
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * expensive than the execution cost plus replan cost of the custom
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * plans.)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Note that if generic_cost is -1 (indicating we&amp;#39;ve not yet determined
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * the generic plan cost), we&amp;#39;ll always prefer generic at this point.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;generic_cost &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; avg_custom_cost)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}		
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;As long as the generic plan&amp;rsquo;s cost is less than the average cost of the first 5 custom plans, the generic plan is used.&lt;/p&gt;
&lt;p&gt;While the 5-execution mechanism is well-known, it&amp;rsquo;s important to note how the generic plan is generated. On the 5th execution, there is no generic plan yet (initially, &lt;code&gt;generic_cost=-1&lt;/code&gt;), so it directly goes to the &lt;code&gt;!customplan&lt;/code&gt; logic in &lt;code&gt;GetCachedPlan&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CachedPlan &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;GetCachedPlan&lt;/span&gt;(CachedPlanSource &lt;span style="color:#f92672"&gt;*&lt;/span&gt;plansource, ParamListInfo boundParams,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; useResOwner, QueryEnvironment &lt;span style="color:#f92672"&gt;*&lt;/span&gt;queryEnv)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	customplan &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;choose_custom_plan&lt;/span&gt;(plansource, boundParams);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;customplan)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;CheckCachedPlan&lt;/span&gt;(plansource))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* We want a generic plan, and we already have a valid one */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			plan &lt;span style="color:#f92672"&gt;=&lt;/span&gt; plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;gplan;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(plan&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;magic &lt;span style="color:#f92672"&gt;==&lt;/span&gt; CACHEDPLAN_MAGIC);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Build a new generic plan */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			plan &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;BuildCachedPlan&lt;/span&gt;(plansource, qlist, NULL, queryEnv);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Just make real sure plansource-&amp;gt;gplan is clear */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ReleaseGenericPlan&lt;/span&gt;(plansource);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Link the new generic plan into the plansource */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;gplan &lt;span style="color:#f92672"&gt;=&lt;/span&gt; plan;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			plan&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;refcount&lt;span style="color:#f92672"&gt;++&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Immediately reparent into appropriate context */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;is_saved)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* saved plans all live under CacheMemoryContext */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;MemoryContextSetParent&lt;/span&gt;(plan&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;context, CacheMemoryContext);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				plan&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;is_saved &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* otherwise, it should be a sibling of the plansource */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;MemoryContextSetParent&lt;/span&gt;(plan&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;context,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									 &lt;span style="color:#a6e22e"&gt;MemoryContextGetParent&lt;/span&gt;(plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;context));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Update generic_cost whenever we make a new generic plan */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;generic_cost &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;cached_plan_cost&lt;/span&gt;(plan, false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * If, based on the now-known value of generic_cost, we&amp;#39;d not have
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * chosen to use a generic plan, then forget it and make a custom
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * plan. This is a bit of a wart but is necessary to avoid a
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * glitch in behavior when the custom plans are consistently big
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * winners; at some point we&amp;#39;ll experiment with a generic plan and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * find it&amp;#39;s a loser, but we don&amp;#39;t want to actually execute that
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * plan.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			customplan &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;choose_custom_plan&lt;/span&gt;(plansource, boundParams);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * If we choose to plan again, we need to re-copy the query_list,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * since the planner probably scribbled on it. We can force
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * BuildCachedPlan to do that by passing NIL.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			qlist &lt;span style="color:#f92672"&gt;=&lt;/span&gt; NIL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; plan;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}	
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In the &lt;code&gt;!customplan&lt;/code&gt; logic, if a generic plan already exists, use it directly. If not, generate one via &lt;code&gt;BuildCachedPlan&lt;/code&gt;, which is the main logic for generating plans — converting a query tree into a plan tree.&lt;/p&gt;
&lt;p&gt;What about parameters? As the comments explain, pass NULL when there are no parameters to enter the plan generation logic:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;To build a generic, parameter&lt;span style="color:#f92672"&gt;-&lt;/span&gt;value&lt;span style="color:#f92672"&gt;-&lt;/span&gt;independent plan, pass NULL &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; boundParams. To build a custom plan, pass the actual parameter values via
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; boundParams&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;What execution plan does the optimizer prefer when NULL is passed? This part of the code logic is somewhat complex. From the optimizer&amp;rsquo;s perspective, there may be multiple plans to choose from, but one must be selected as the generic plan.&lt;/p&gt;
&lt;p&gt;And that selected generic plan is what gets compared against the cost of the first 5 plans.&lt;/p&gt;
&lt;p&gt;Why didn&amp;rsquo;t repeatedly executing a lower-cost plan produce the desired generic plan?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What the generic plan looks like has nothing to do with the first five execution plans — the first five only determine whether this generic plan gets bound.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;From an optimizer design perspective, generic plans are meant to reduce parsing time and improve SQL execution efficiency, suitable for many small queries. The problem is that generic plans themselves are crude, and PostgreSQL introduced the five-execution mechanism precisely to reduce the likelihood of a generic plan being terrible.&lt;/p&gt;
&lt;p&gt;Even with the five-execution mechanism, the reasons a bad generic plan still gets bound are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Generic plans are plans too, and they can inherently be bad&lt;/li&gt;
&lt;li&gt;Statistics are inaccurate, so the generic plan&amp;rsquo;s cost estimate is very low&lt;/li&gt;
&lt;li&gt;The first five executions had low selectivity (or other factors) causing high custom plan costs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Prepared Statement Invalidation
 &lt;div id="prepared-statement-invalidation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#prepared-statement-invalidation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Besides DDL, &lt;code&gt;DEALLOCATE&lt;/code&gt;, and disconnecting sessions, collecting statistics can also invalidate prepared statements — but this is a PostgreSQL 14 feature.&lt;/p&gt;
&lt;p&gt;PostgreSQL 13:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;PostgreSQL will force re-analysis and re-planning of the statement before using it whenever database objects used in the statement have undergone definitional (DDL) changes since the previous use of the prepared statement&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;PostgreSQL 14:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;PostgreSQL will force re-analysis and re-planning of the statement before using it whenever database objects used in the statement have undergone definitional (DDL) changes or their planner statistics have been updated since the previous use of the prepared statement&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Test confirming that in PostgreSQL 13, collecting statistics does not invalidate prepared statements:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;11&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_id &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;033&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;033&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; Removed &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; Filter: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;098&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;050&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_prepared_statements;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;statement&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; prepare_time &lt;span style="color:#f92672"&gt;|&lt;/span&gt; parameter_types &lt;span style="color:#f92672"&gt;|&lt;/span&gt; from_sql 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+-----------------------------------------------+-------------------------------+-----------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plan1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; plan1(text,integer) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#f92672"&gt;+|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;966733&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;text,integer&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tlzl1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;month&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; tlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ANALYZE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_prepared_statements;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;statement&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; prepare_time &lt;span style="color:#f92672"&gt;|&lt;/span&gt; parameter_types &lt;span style="color:#f92672"&gt;|&lt;/span&gt; from_sql 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+-----------------------------------------------+-------------------------------+-----------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plan1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; plan1(text,integer) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#f92672"&gt;+|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;966733&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;text,integer&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tlzl1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;month&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;11&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_id &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;051&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;052&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; Removed &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; Filter: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;022&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;098&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;JDBC Prepared Statements
 &lt;div id="jdbc-prepared-statements" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#jdbc-prepared-statements" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Prepared statements are not unique to PostgreSQL — other databases also have similar pre-parsing features. For example, Oracle can achieve similar functionality.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://jenkov.com/tutorials/jdbc/preparedstatement.html#:~:text=JDBC%20PreparedStatement%201%20Creating%20a%20PreparedStatement%20Before%20you,Reusing%20a%20PreparedStatement%20...%205%20PreparedStatement%20Performance%20" target="_blank" rel="noreferrer"&gt;JDBC&lt;/a&gt; itself can call the database&amp;rsquo;s pre-parsing interface and directly use prepared statements.&lt;/p&gt;
&lt;p&gt;Example JDBC configuration:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;String &lt;span style="color:#66d9ef"&gt;sql&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;select * from people where id=?&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PreparedStatement preparedStatement &lt;span style="color:#f92672"&gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;connection&lt;/span&gt;.prepareStatement(&lt;span style="color:#66d9ef"&gt;sql&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Recommendations
 &lt;div id="recommendations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#recommendations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Reduce the table-level &lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt; to &lt;code&gt;0.02&lt;/code&gt; (why 0.02? Because 0.02 &amp;lt; 1/31). Since data is written and queried simultaneously, manual collection timing is hard to get right; reducing &lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt; can only mitigate this problem.&lt;/li&gt;
&lt;li&gt;Consider removing the PREPARE setting in JDBC, or set &lt;code&gt;force_custom_plan&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Adjust the SQL logic.&lt;/li&gt;
&lt;li&gt;Adjust indexes: 4.1 Remove unnecessary time indexes; 4.2 Rebuild the index that gets chosen after predicate out-of-bounds as a composite index that includes the &lt;code&gt;id&lt;/code&gt; field (a good suggestion).&lt;/li&gt;
&lt;li&gt;Emergency procedure: If business performance doesn&amp;rsquo;t recover after statistics collection, and you&amp;rsquo;ve confirmed the execution plan has changed via manual EXPLAIN, consider killing sessions (for versions before 13).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Finally, predicate out-of-bounds problems exist in essentially all databases, especially on time-based fields. There is currently no simple yet perfectly effective solution. Oracle&amp;rsquo;s SPM (SQL Plan Management) gains another point in my favorability&amp;hellip;&lt;/p&gt;</content:encoded></item><item><title>Incorrect Execution Plan Caused by Partition Permission Issues</title><link>https://lastdba.com/en/2024/08/12/incorrect-execution-plan-caused-by-partition-permission-issues/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/incorrect-execution-plan-caused-by-partition-permission-issues/</guid><description>&lt;h2 class="relative group"&gt;Problem Overview
 &lt;div id="problem-overview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-overview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Last night, the business team updated a SQL query. Previously, the query ran very fast without the &lt;code&gt;DATE_CREATED&lt;/code&gt; field (the partition key). After the release, the partition field was added to reduce the number of partitions accessed. However, after adding it, the UPDATE execution actually became slower.&lt;/p&gt;
&lt;p&gt;Before:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; TABLE_RECORD
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;, DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LOCALTIMESTAMP&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; APPL_NO &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; TABLE_RECORD
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;, DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LOCALTIMESTAMP&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; APPL_NO &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; interval &lt;span style="color:#e6db74"&gt;&amp;#39;31&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Before the release, access time was in milliseconds. After the release, access time was 10 seconds. The SQL runs frequently, and the business found this unacceptable.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Overview
 &lt;div id="problem-overview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-overview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Last night, the business team updated a SQL query. Previously, the query ran very fast without the &lt;code&gt;DATE_CREATED&lt;/code&gt; field (the partition key). After the release, the partition field was added to reduce the number of partitions accessed. However, after adding it, the UPDATE execution actually became slower.&lt;/p&gt;
&lt;p&gt;Before:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; TABLE_RECORD
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;, DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LOCALTIMESTAMP&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; APPL_NO &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; TABLE_RECORD
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;, DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LOCALTIMESTAMP&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; APPL_NO &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; interval &lt;span style="color:#e6db74"&gt;&amp;#39;31&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Before the release, access time was in milliseconds. After the release, access time was 10 seconds. The SQL runs frequently, and the business found this unacceptable.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Problem Analysis
 &lt;div id="problem-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;The Execution Plan Appeared Correct
 &lt;div id="the-execution-plan-appeared-correct" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-execution-plan-appeared-correct" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Table structure:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; TABLE_RECORD
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.TABLE_RECORD&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------+-----------------------------+-----------+----------+---------------------------------------------------+----------+--------------+--------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id_TABLE_RECORD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;seq_TABLE_RECORD&amp;#39;&lt;/span&gt;::regclass) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; appl_no &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; r_appl_no &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; created_by &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sys&amp;#39;&lt;/span&gt;::character varying &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; updated_by &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sys&amp;#39;&lt;/span&gt;::character varying &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_updated &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;: RANGE (date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;date_TABLE_RECORD&amp;#34;&lt;/span&gt; btree (date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;idx_dateupdated&amp;#34;&lt;/span&gt; btree (date_updated)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;idx_applnodeleted&amp;#34;&lt;/span&gt; btree (appl_no, is_deleted)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;nk_TABLE_RECORD&amp;#34;&lt;/span&gt; btree (appl_no)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: TABLE_RECORD_202211 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2022-11-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2022-12-01 00:00:00&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TABLE_RECORD_202303 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TABLE_RECORD_202304 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-05-01 00:00:00&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TABLE_RECORD_202305 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-05-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-06-01 00:00:00&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TABLE_RECORD_202306 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-06-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-07-01 00:00:00&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TABLE_RECORD_202512 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2025-12-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2026-01-01 00:00:00&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TABLE_RECORD_other &lt;span style="color:#66d9ef"&gt;DEFAULT&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This SQL would access partitions from the last 2 months, both of which contained data. The above UPDATE would only update one row.&lt;/p&gt;
&lt;p&gt;At first, analyzing the problem was very confusing because when we ran EXPLAIN, the execution plan looked fine.&lt;/p&gt;
&lt;p&gt;EXPLAIN partition scan info:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; TABLE_RECORD_202302_date_created_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202302 TABLE_RECORD_4 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;485&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; TABLE_RECORD_202303_date_created_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202303 TABLE_RECORD_5 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;482&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; TABLE_RECORD_202304_date_created_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202304 TABLE_RECORD_6 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;481&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_25 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202305 TABLE_RECORD_7 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;483&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_14 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202306 TABLE_RECORD_8 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;485&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_38 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202307 TABLE_RECORD_9 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3502&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_1 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202308 TABLE_RECORD_10 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3502&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Partition data distribution:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;),tableoid::regclass &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; TABLE_RECORD &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tableoid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+---------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;56558&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202303
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4436&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202211
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;6929&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202306
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;945&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202305
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1413&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202304
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;5499&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202212
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1486&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4722&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202302&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The execution plan appeared to access different indexes for different partitions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;date_TABLE_RECORD&lt;/code&gt;: index on the partition key&lt;/li&gt;
&lt;li&gt;&lt;code&gt;idx_applnodeleted&lt;/code&gt;: composite index on &lt;code&gt;appl_no, is_deleted&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In reality, the SQL could prune partitions using the &lt;code&gt;DATE_CREATED&lt;/code&gt; (last 31 days) field. But if it used the index on that field, there would be no selectivity at all. The composite index &lt;code&gt;idx_applnodeleted&lt;/code&gt; on &lt;code&gt;appl_no, is_deleted&lt;/code&gt; had much better selectivity within partitions, so the correct execution plan should choose the &lt;code&gt;idx_applnodeleted&lt;/code&gt; composite index.&lt;/p&gt;
&lt;p&gt;The EXPLAIN plan above is not the actual execution plan, but we can see that the May and June partitions did use the correct index — the &lt;code&gt;appl_no, is_deleted&lt;/code&gt; composite index.&lt;/p&gt;
&lt;p&gt;To view the actual execution plan, we need to execute the SQL. So we changed the UPDATE to a SELECT:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers,timing,&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; TABLE_RECORD &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; APPL_NO &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; interval &lt;span style="color:#e6db74"&gt;&amp;#39;31&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now() ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;266&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;266&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;565&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;566&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;265&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;95&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;388&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;558&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Subplans Removed: &lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_25 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.TABLE_RECORD_202305 TABLE_RECORD_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;059&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;059&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((TABLE_RECORD_1.appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((TABLE_RECORD_1.is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((TABLE_RECORD_1.date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (TABLE_RECORD_1.date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_14 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.TABLE_RECORD_202306 TABLE_RECORD_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;328&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;498&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((TABLE_RECORD_2.appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((TABLE_RECORD_2.is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((TABLE_RECORD_2.date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (TABLE_RECORD_2.date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5867&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;195&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;654&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The SELECT only accessed the May and June partitions, indicating partition pruning worked correctly. Both partitions used the &lt;code&gt;idx_applnodeleted&lt;/code&gt; index, so index selection was also correct.&lt;/p&gt;
&lt;p&gt;Direct execution of the SELECT statement returned results in milliseconds:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; TABLE_RECORD &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; APPL_NO &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; interval &lt;span style="color:#e6db74"&gt;&amp;#39;31&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now() ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;946&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;At this point in the analysis, the execution plan appeared normal and execution time appeared normal.&lt;/p&gt;

&lt;h3 class="relative group"&gt;The Business SQL Was Still Slow
 &lt;div id="the-business-sql-was-still-slow" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-business-sql-was-still-slow" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;However, slow SQL still appeared in the PostgreSQL logs — the UPDATE took 10 seconds:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;06&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;06&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;077&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldbopr&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;116286&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.78.90:51871&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;649&lt;/span&gt;cdebf.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;c63e,&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;UPDATE&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;06&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;759&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12440291&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4002354803&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;duration: 10287.105 ms &amp;#34;&lt;/span&gt; plan:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Query Text: &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; TABLE_RECORD
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;, DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LOCALTIMESTAMP&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; APPL_NO &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; interval &lt;span style="color:#e6db74"&gt;&amp;#39;31&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;203&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;79&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2960&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202211 TABLE_RECORD_1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; TABLE_RECORD_202304_date_created_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202304 TABLE_RECORD_6 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;481&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; TABLE_RECORD_202305_date_created_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202305 TABLE_RECORD_7 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;483&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; TABLE_RECORD_202306_date_created_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202306 TABLE_RECORD_8 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;485&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_38 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202307 TABLE_RECORD_9 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3502&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The May and June partitions were still using the &lt;code&gt;date_created&lt;/code&gt; index on the partition key. The execution plan estimated only 1 row, but in reality these two partitions each had millions of rows.&lt;/p&gt;
&lt;p&gt;This was very confusing — the optimizer itself could choose a better index, and EXPLAIN showed it going to that index, but the business SQL simply wasn&amp;rsquo;t using the correct index.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Updating Statistics
 &lt;div id="updating-statistics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#updating-statistics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since this was a PostgreSQL execution plan issue, the first thought was to collect statistics.&lt;/p&gt;
&lt;p&gt;After the problem occurred, we collected statistics for both the parent partitioned table and child partitions. Concerned that sessions might have cached the execution plan (&lt;code&gt;plan_cache_mode=auto&lt;/code&gt;), we killed all sessions that connected before the statistics collection.&lt;/p&gt;
&lt;p&gt;The logs still showed the SQL taking 10 seconds, indicating it wasn&amp;rsquo;t a statistics issue.&lt;/p&gt;
&lt;p&gt;At this point the problem remained unsolved. We seemed to have exhausted all options.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Root Cause
 &lt;div id="root-cause" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#root-cause" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Earlier, when analyzing execution plans, the DBA&amp;rsquo;s EXPLAIN output differed from the application&amp;rsquo;s execution plan. However, we had been executing everything as the PostgreSQL superuser. We switched to the application user and ran EXPLAIN again — the execution plan matched what was in the logs!&lt;/p&gt;
&lt;p&gt;Since we had previously encountered issues with native partitioned table permissions causing abnormal execution plans, we immediately checked partition permissions.&lt;/p&gt;
&lt;p&gt;Parent table permissions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;dp&lt;span style="color:#f92672"&gt;+&lt;/span&gt; TABLE_RECORD
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Policies 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+--------------------------+-------------------+-------------------------------------+-------------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldbdata&lt;span style="color:#f92672"&gt;=&lt;/span&gt;arwdDxt&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzldbdata &lt;span style="color:#f92672"&gt;+|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r_lzldbdata_qry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzldbdata &lt;span style="color:#f92672"&gt;+|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r_lzldbdata_dml&lt;span style="color:#f92672"&gt;=&lt;/span&gt;arwd&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzldbdata &lt;span style="color:#f92672"&gt;+|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Child partition permissions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;dp&lt;span style="color:#f92672"&gt;+&lt;/span&gt; TABLE_RECORD_202505
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Policies 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+---------------------------------+-------+------------------------------------+-------------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202505 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldbdata&lt;span style="color:#f92672"&gt;=&lt;/span&gt;arwdDxt&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzldbdata &lt;span style="color:#f92672"&gt;+|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The partition permissions were missing the &lt;code&gt;r_lzldbdata_dml&lt;/code&gt; role, which is granted to the business user.&lt;/p&gt;
&lt;p&gt;We immediately granted the permissions, and the problem was resolved:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202305 &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; r_lzldbdata_dml;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202306 &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; r_lzldbdata_dml;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After switching to the &lt;code&gt;opr&lt;/code&gt; user again and running EXPLAIN, the execution plan was correct — the May and June partitions used the proper index:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;\c - lzldbopr&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; TABLE_RECORD_202303_date_created_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202303 TABLE_RECORD_5 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;482&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; TABLE_RECORD_202304_date_created_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202304 TABLE_RECORD_6 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;481&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_25 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202305 TABLE_RECORD_7 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;483&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_14 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202306 TABLE_RECORD_8 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;485&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_38 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202307 TABLE_RECORD_9 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3502&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_1 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202308 TABLE_RECORD_10 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3502&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;No more slow UPDATE statements were observed in the PostgreSQL logs.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Testing (Not Reproduced)
 &lt;div id="testing-not-reproduced" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#testing-not-reproduced" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Initial table creation script:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Switch to non-superuser
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt; lzldbdata
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- create table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PUBLIC&lt;/span&gt;.LZLPARTITION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; APPL_NO varchar(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	IS_DELETED varchar(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; DATE_CREATED &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULT&lt;/span&gt; now(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; DATE_UPDATED &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULT&lt;/span&gt; now()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) PARTITION &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; RANGE(DATE_CREATED);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- indexes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; DATE_LZLPARTITION &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PUBLIC&lt;/span&gt;.LZLPARTITION (DATE_CREATED);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; NK_LZLPARTITION &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PUBLIC&lt;/span&gt;.LZLPARTITION (APPL_NO);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- privs
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;GRANT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.LZLPARTITION &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; r_lzldbdata_qry;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;GRANT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;DELETE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.LZLPARTITION &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; r_lzldbdata_dml;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- partition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION_202301 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION_202302 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION_202303 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION_202304 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-05-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION_202305 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-05-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-06-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION_202306 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-06-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-07-01 00:00:00&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Generate data:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.LZLPARTITION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; n &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;N&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; to_char(to_date(&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;YYYY-MM-DD&amp;#39;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;+&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt; &lt;span style="color:#f92672"&gt;||&lt;/span&gt; n &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39; minute&amp;#39;&lt;/span&gt;) ::interval, &lt;span style="color:#e6db74"&gt;&amp;#39;YYYY-MM-DD&amp;#39;&lt;/span&gt;)::&lt;span style="color:#e6db74"&gt;&amp;#34;date&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; now()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;300000&lt;/span&gt;) n&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Data distribution:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;),tableoid::regclass &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tableoid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;44640&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;40320&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition_202302
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;44640&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition_202303
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;43200&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition_202304
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;44640&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition_202305
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;43200&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition_202306
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;39361&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition_202307&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Permissions not inherited:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;dp&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Policies 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+--------------+-------------------+-------------------------------------+-------------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition &lt;span style="color:#f92672"&gt;|&lt;/span&gt; partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldbdata&lt;span style="color:#f92672"&gt;=&lt;/span&gt;arwdDxt&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzldbdata &lt;span style="color:#f92672"&gt;+|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r_lzldbdata_qry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzldbdata &lt;span style="color:#f92672"&gt;+|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r_lzldbdata_dml&lt;span style="color:#f92672"&gt;=&lt;/span&gt;arwd&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzldbdata &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;dp&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition_202306
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Policies 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+---------------------+-------+-------------------+-------------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition_202306 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Execution plan (correct):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; APPL_NO &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;217450&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;N&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; interval &lt;span style="color:#e6db74"&gt;&amp;#39;31&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;76&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;77&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;74&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Subplans Removed: &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzlpartition_202305_appl_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition_202305 lzlpartition_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;217450&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzlpartition_202306_appl_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition_202306 lzlpartition_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;217450&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The permissions were still not inherited. In fact, we tested on other PostgreSQL versions and observed the same behavior — it seems to be a general behavior.&lt;/p&gt;
&lt;p&gt;However, even so, we couldn&amp;rsquo;t reproduce the issue. The test results used the correct index, unlike the production environment which used the wrong index.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Since we had collected statistics and killed sessions, it shouldn&amp;rsquo;t have been a cached execution plan issue. After executing GRANT, the partition execution plan immediately became correct (even granting just one partition fixed that specific partition), so we are fairly confident that the partition permission issue caused the abnormal partition execution plan.&lt;/p&gt;
&lt;p&gt;The analysis and resolution process can be summarized as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Switch to the application user to view the execution plan. Using the superuser to view execution plans is a common practice, but the plan seen from the superuser may not be correct.&lt;/li&gt;
&lt;li&gt;Permissions on child partitions of partitioned tables. The root cause is that permissions on child partitions of PostgreSQL partitioned tables were inconsistent with the parent table, causing the execution plan to be abnormal. In other words, permission issues affected PostgreSQL&amp;rsquo;s execution plan.&lt;/li&gt;
&lt;li&gt;This issue is difficult to reproduce and occurs very, very rarely.&lt;/li&gt;
&lt;li&gt;Permission-caused execution plan anomalies are extremely subtle and hard to diagnose.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Two questions worth deeper discussion:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Permission issues shouldn&amp;rsquo;t affect execution plans. Why do permissions affect execution plans?&lt;/li&gt;
&lt;li&gt;Child partition permissions are inconsistent with parent table permissions. Why don&amp;rsquo;t child partitions fully inherit parent table permissions?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;A bug report has been submitted to see what the official team says.&lt;/p&gt;</content:encoded></item><item><title>ORDER BY LIMIT 10 Slower Than ORDER BY LIMIT 100</title><link>https://lastdba.com/en/2024/08/12/order-by-limit-10-slower-than-order-by-limit-100/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/order-by-limit-10-slower-than-order-by-limit-100/</guid><description>&lt;h2 class="relative group"&gt;Problem Analysis
 &lt;div id="problem-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;When executing SQL in a PostgreSQL database, &lt;code&gt;ORDER BY LIMIT 10&lt;/code&gt; runs slower than &lt;code&gt;ORDER BY LIMIT 100&lt;/code&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Execution Plan Analysis
 &lt;div id="execution-plan-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#execution-plan-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#f92672"&gt;*&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; cl.ITEM_DESC &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tablelzl2 cl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; item_name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;name&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; cl.ITEM_NO&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;abcdefg&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;item&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablelzl1 RI
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; RI.column1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;AAAA&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; RI.column2 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;applyno20231112&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RI.column3 &lt;span style="color:#66d9ef"&gt;DESC&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Limit (cost=0.43..1522.66 rows=10 width=990)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan Backward using idx_tablelzl1_column3 on tablelzl1 ri (cost=0.43..158007.45 rows=1038 width=990)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((column1)::text = &amp;#39;AAAA&amp;#39;::text) AND ((column2)::text = &amp;#39;applyno20231112&amp;#39;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SubPlan 1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan using uk_tablelzl2_ii on tablelzl2 cl (cost=0.27..5.29 rows=1 width=18)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: (((item_no)::text = &amp;#39;manualSign&amp;#39;::text) AND ((item_name)::text = (ri.manual_sign)::text))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The main table does not use the &lt;code&gt;column2&lt;/code&gt; index. Instead it uses an &lt;strong&gt;Index Scan Backward&lt;/strong&gt; on the &lt;code&gt;column3&lt;/code&gt; sort index. The scan cost for the index is very high, yet the final cost looks low. Actual execution takes 9 seconds.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Analysis
 &lt;div id="problem-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;When executing SQL in a PostgreSQL database, &lt;code&gt;ORDER BY LIMIT 10&lt;/code&gt; runs slower than &lt;code&gt;ORDER BY LIMIT 100&lt;/code&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Execution Plan Analysis
 &lt;div id="execution-plan-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#execution-plan-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#f92672"&gt;*&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; cl.ITEM_DESC &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tablelzl2 cl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; item_name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;name&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; cl.ITEM_NO&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;abcdefg&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;item&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablelzl1 RI
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; RI.column1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;AAAA&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; RI.column2 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;applyno20231112&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RI.column3 &lt;span style="color:#66d9ef"&gt;DESC&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Limit (cost=0.43..1522.66 rows=10 width=990)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan Backward using idx_tablelzl1_column3 on tablelzl1 ri (cost=0.43..158007.45 rows=1038 width=990)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((column1)::text = &amp;#39;AAAA&amp;#39;::text) AND ((column2)::text = &amp;#39;applyno20231112&amp;#39;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SubPlan 1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan using uk_tablelzl2_ii on tablelzl2 cl (cost=0.27..5.29 rows=1 width=18)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: (((item_no)::text = &amp;#39;manualSign&amp;#39;::text) AND ((item_name)::text = (ri.manual_sign)::text))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The main table does not use the &lt;code&gt;column2&lt;/code&gt; index. Instead it uses an &lt;strong&gt;Index Scan Backward&lt;/strong&gt; on the &lt;code&gt;column3&lt;/code&gt; sort index. The scan cost for the index is very high, yet the final cost looks low. Actual execution takes 9 seconds.&lt;/p&gt;
&lt;p&gt;Changing &lt;code&gt;LIMIT 10&lt;/code&gt; to &lt;code&gt;LIMIT 100&lt;/code&gt; yields a normal execution plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#f92672"&gt;*&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; cl.ITEM_DESC &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tablelzl2 cl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; cl.ITEM_NAME &lt;span style="color:#f92672"&gt;=&lt;/span&gt; RI.MANUAL_SIGN &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; cl.ITEM_NO&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;manualSign&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;manualSign&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablelzl1 RI
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; RI.column1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;AAAA&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; RI.column2 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;applyno20231112&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RI.column3 &lt;span style="color:#66d9ef"&gt;DESC&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-----------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Limit (cost=2632.28..3162.78 rows=100 width=990)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Result (cost=2632.28..8138.87 rows=1038 width=990)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Sort (cost=2632.28..2634.87 rows=1038 width=474)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort Key: ri.column3 DESC
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan using idx_cri_column2 on tablelzl1 ri (cost=0.43..2592.61 rows=1038 width=474)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: ((column2)::text = &amp;#39;applyno20231112&amp;#39;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((column1)::text = &amp;#39;AAAA&amp;#39;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SubPlan 1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan using uk_tablelzl2_ii on tablelzl2 cl (cost=0.27..5.29 rows=1 width=18)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: (((item_no)::text = &amp;#39;manualSign&amp;#39;::text) AND ((item_name)::text = (ri.manual_sign)::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(10 rows)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The subquery plan remains unchanged. The main table now uses the &lt;code&gt;column2&lt;/code&gt; single-column index, fetches rows, sorts, then applies LIMIT — execution is extremely fast.&lt;/p&gt;
&lt;p&gt;This is not just about LIMIT values — changing only the &lt;code&gt;column2&lt;/code&gt; value in the original SQL can also produce a normal plan. In practice, only a few specific &lt;code&gt;column2&lt;/code&gt; values trigger the abnormal plan.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Execution plan comparison:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;column2&lt;/em&gt; is a filter column, &lt;em&gt;column3&lt;/em&gt; is a sort column. The two plans choose different indexes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Abnormal &lt;code&gt;LIMIT 10&lt;/code&gt; plan:&lt;/strong&gt; &lt;em&gt;Backward scan sort-column index → fetch rows → limit&lt;/em&gt;. No extra sort needed; scanning backward, it can stop as soon as it finds enough rows matching the LIMIT. The estimated cost of scanning the sort-column index is very high, but the top-level LIMIT cost estimate is very low.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Normal &lt;code&gt;LIMIT 100&lt;/code&gt; plan:&lt;/strong&gt; &lt;em&gt;Access filter-column index → fetch rows → sort by sort column → limit&lt;/em&gt;. Because sorting is required, all matching index entries must be retrieved. The filter-column index scan itself has a low cost estimate.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So the key issue is: the optimizer &lt;strong&gt;underestimates the cost of a partial backward scan on the sort index&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Actual Execution
 &lt;div id="actual-execution" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#actual-execution" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Let&amp;rsquo;s look at &lt;code&gt;explain (analyze,buffers)&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Limit (cost=0.43..1521.93 rows=10 width=990) (actual time=23.311..8122.516 rows=10 loops=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=861100 read=42985 dirtied=7
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I/O Timings: read=6741.003
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan Backward using idx_tablelzl1_column3 on tablelzl1 ri (cost=0.43..157932.45 rows=1038 width=990) (actual time=23.309..8122.505 rows=10 loops=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((column1)::text = &amp;#39;AAAA&amp;#39;::text) AND ((column2)::text = &amp;#39;applyno20231112&amp;#39;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Rows Removed by Filter: 1521796
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=861100 read=42985 dirtied=7
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I/O Timings: read=6741.003
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SubPlan 1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan using uk_tablelzl2_ii on tablelzl2 cl (cost=0.27..5.29 rows=1 width=18) (actual time=0.005..0.005 rows=0 loops=10)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: (((item_no)::text = &amp;#39;manualSign&amp;#39;::text) AND ((item_name)::text = (ri.manual_sign)::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=6
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=121 read=28
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I/O Timings: read=1.476
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: 2.314 ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: 8122.658 ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Limit (cost=2632.28..3162.78 rows=100 width=990) (actual time=150.101..150.122 rows=14 loops=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=700 read=274
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I/O Timings: read=146.903
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Result (cost=2632.28..8138.87 rows=1038 width=990) (actual time=150.100..150.119 rows=14 loops=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=700 read=274
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I/O Timings: read=146.903
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Sort (cost=2632.28..2634.87 rows=1038 width=474) (actual time=150.072..150.073 rows=14 loops=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort Key: ri.column3 DESC
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort Method: quicksort Memory: 30kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=694 read=274
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I/O Timings: read=146.903
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan using idx_cri_column2 on tablelzl1 ri (cost=0.43..2592.61 rows=1038 width=474) (actual time=0.418..149.973 rows=14 loops=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: ((column2)::text = &amp;#39;applyno20231112&amp;#39;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((column1)::text = &amp;#39;AAAA&amp;#39;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Rows Removed by Filter: 1218
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=691 read=274
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I/O Timings: read=146.903
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SubPlan 1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan using uk_tablelzl2_ii on tablelzl2 cl (cost=0.27..5.29 rows=1 width=18) (actual time=0.002..0.002 rows=0 loops=14)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: (((item_no)::text = &amp;#39;manualSign&amp;#39;::text) AND ((item_name)::text = (ri.manual_sign)::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=6
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: 0.334 ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: 150.257 ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;LIMIT 10&lt;/code&gt; plan executes in 8 seconds: shared hit=861,100, disk read=42,985, &lt;strong&gt;1,521,796 rows removed by filter&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;LIMIT 100&lt;/code&gt; plan executes in 0.15 seconds: shared hit=694, read=274, 1,218 rows removed.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;LIMIT 10&lt;/code&gt; plan is clearly abnormal — it &lt;strong&gt;reads far too many rows before finding qualifying ones&lt;/strong&gt;, which is why the query is slow.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Statistics Analysis
 &lt;div id="statistics-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#statistics-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The estimated cost is low, but the actual scan touches many index rows. First, check whether the statistics are accurate.&lt;/p&gt;
&lt;p&gt;Table statistics:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@cnsz381785:7169/(rasesql)phmamp][10-30.15:01:26]M=# select relpages,reltuples::bigint from pg_class where relname=&amp;#39;tablelzl1&amp;#39;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relpages | reltuples 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 91172 | 2280874 -- roughly matches actual count&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Column statistics:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[phmampopr@cnsz381785:7169/(rasesql)phmamp][10-27.17:08:48]M=&amp;gt; select * from pg_stats where tablename=&amp;#39;tablelzl1&amp;#39; and attname=&amp;#39;column2&amp;#39;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-[ RECORD 1 ]----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;schemaname | public
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tablename | tablelzl1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;attname | column2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;inherited | f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;null_frac | 0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;avg_width | 18
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_distinct | -0.11990886
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_vals | {applyno20231112,DY20190723006650,DY20200102012899,DY20180827000557,DY20190524001304,DY20190529001885,DY20190728002359}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_freqs | {0.0005,0.00026666667,0.00023333334,0.0002,0.0002,0.0002,0.0002}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;histogram_bounds | {CULZF0000121605605,DSNEW0000126854232,DSNEW0000137652871,DY20160516001057,DY20161104005509,DY20170306002677,DY20170703010428,DY20170928013517,DY20180410007383,DY20180615002936,DY20180
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;correlation | 0.3131596
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_elems | [null]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_elem_freqs | [null]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;elem_count_histogram | [null]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The value &lt;code&gt;applyno20231112&lt;/code&gt; happens to be the top &lt;code&gt;most_common_vals&lt;/code&gt;, with an estimated frequency of 0.0005. Multiplying: 2,280,874 × 0.0005 = 1,140, which is close to the real count of 1,232.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@cnsz381785:7169/(rasesql)phmamp][10-30.15:05:28]M=# select count(*) from tablelzl1 where column2 = &amp;#39;applyno20231112&amp;#39;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; count 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 1232&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Statistics are accurate. Running &lt;code&gt;ANALYZE&lt;/code&gt; to recollect statistics would not fix this.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Effect of Uneven Data Distribution
 &lt;div id="the-effect-of-uneven-data-distribution" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-effect-of-uneven-data-distribution" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Using the current statistics, the estimated number of matching rows is ~1,140. On average, finding the first matching row through the sort-column index would require scanning 2,280,874 / 1,140 ≈ 2,000 index entries. For 10 rows, about 20,000 entries; for 100 rows, about 200,000 entries.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s disable sort and force the &lt;code&gt;LIMIT 100&lt;/code&gt; statement to use the sort-column index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_sort&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--limit 100 execution plan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Limit (cost=0.43..15222.69 rows=100 width=990)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan Backward using idx_tablelzl1_column3 on tablelzl1 ri (cost=0.43..158007.45 rows=1038 width=990)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((column1)::text = &amp;#39;AAAA&amp;#39;::text) AND ((column2)::text = &amp;#39;applyno20231112&amp;#39;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SubPlan 1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan using uk_tablelzl2_ii on tablelzl2 cl (cost=0.27..5.29 rows=1 width=18)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: (((item_no)::text = &amp;#39;manualSign&amp;#39;::text) AND ((item_name)::text = (ri.manual_sign)::text))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When &lt;code&gt;LIMIT 10&lt;/code&gt; becomes &lt;code&gt;LIMIT 100&lt;/code&gt;, the cost jumps from 1522.66 to 15222.69 — roughly a ×10 multiplication. The &lt;code&gt;LIMIT 100&lt;/code&gt; cost of 15222.69 now exceeds the filter-column index plan&amp;rsquo;s cost of 3162.78, so the optimizer switches indexes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The above estimates all assume data is evenly scattered across the sort-column index. In reality, the data could be at the very end (backward scan finds it quickly), or all concentrated in the first few leaf pages (requiring nearly a full index scan + fetch), making the cost extremely high.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The correlation between the two columns — how the data is distributed across the index — determines whether using the sort-column index is efficient.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s look at how many rows were actually scanned:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan Backward using idx_tablelzl1_column3 on tablelzl1 ri (cost=0.43..157932.45 rows=1038 width=990) (actual time=23.309..8122.505 rows=10 loops=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((column1)::text = &amp;#39;AAAA&amp;#39;::text) AND ((column2)::text = &amp;#39;applyno20231112&amp;#39;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Rows Removed by Filter: 1521796&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In reality, about &lt;strong&gt;1,521,796 rows&lt;/strong&gt; were scanned to find just 10 matching rows. The estimate was 20,000 — a &lt;strong&gt;76× discrepancy&lt;/strong&gt;!&lt;/p&gt;

&lt;h2 class="relative group"&gt;Trigger Conditions
 &lt;div id="trigger-conditions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#trigger-conditions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Must involve &lt;code&gt;WHERE&lt;/code&gt; + &lt;code&gt;ORDER BY&lt;/code&gt; + &lt;code&gt;LIMIT&lt;/code&gt; clauses&lt;/li&gt;
&lt;li&gt;Both the sort column and filter column must have indexes&lt;/li&gt;
&lt;li&gt;The LIMIT value is typically not very large&lt;/li&gt;
&lt;li&gt;Uneven data distribution&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Solution
 &lt;div id="solution" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#solution" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Rewrite the SQL: add an expression to prevent the &lt;code&gt;ORDER BY&lt;/code&gt; column from using its index.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#f92672"&gt;*&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; cl.ITEM_DESC &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tablelzl2 cl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; cl.ITEM_NAME &lt;span style="color:#f92672"&gt;=&lt;/span&gt; RI.MANUAL_SIGN &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; cl.ITEM_NO&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;manualSign&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;manualSign&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablelzl1 RI
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; RI.column1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;AAAA&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; RI.column2 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;applyno20231112&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RI.column3 &lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DESC&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;How Oracle Handles This
 &lt;div id="how-oracle-handles-this" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-oracle-handles-this" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Cost Estimation Differences in Execution Plans
 &lt;div id="cost-estimation-differences-in-execution-plans" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cost-estimation-differences-in-execution-plans" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;From the analysis above, the PostgreSQL execution plan&amp;rsquo;s cost looks unbalanced — the upper-level cost is lower than the inner-level cost, unlike Oracle&amp;rsquo;s hierarchical accumulation.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s run an experiment: a table containing only rows where &lt;code&gt;colname='x'&lt;/code&gt;, comparing how PostgreSQL and Oracle calculate costs:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@cnsz381785:7169/(rasesql)dbmgr][10-31.14:32:19]M=# explain select * from testlzl where col1=&amp;#39;x&amp;#39; limit 1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-----------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Limit (cost=0.00..0.02 rows=1 width=2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Seq Scan on testlzl (cost=0.00..17747.20 rows=1048576 width=2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((col1)::text = &amp;#39;x&amp;#39;::text)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@cnsz381785:7169/(rasesql)dbmgr][10-31.14:32:30]M=# explain select * from testlzl where col1=&amp;#39;xx&amp;#39; limit 1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-----------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Limit (cost=0.00..17747.20 rows=1 width=2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Seq Scan on testlzl (cost=0.00..17747.20 rows=1 width=2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((col1)::text = &amp;#39;xx&amp;#39;::text)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When &lt;code&gt;col1='x'&lt;/code&gt;, the row is found immediately, but the LIMIT cost is not pushed down into the seq scan cost — the total cost is 17747.20, the same as scanning the whole table. The LIMIT cost is not pushed into the inner node&amp;rsquo;s cost, but the &lt;strong&gt;rows estimate is&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Now let&amp;rsquo;s see how Oracle handles the same case:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SYS@t8icss1&amp;gt; select * from dbmgr.testlzl where a=&amp;#39;x&amp;#39; and rownum&amp;lt;=1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;1 row selected.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Plan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Plan hash value: 2045386539
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| 0 | SELECT STATEMENT | | 1 | 2 | 2 (0)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|* 1 | COUNT STOPKEY | | | | | |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|* 2 | TABLE ACCESS FULL| TESTLZL | 1 | 2 | 2 (0)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Predicate Information (identified by operation id):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 1 - filter(ROWNUM&amp;lt;=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 2 - filter(&amp;#34;A&amp;#34;=&amp;#39;x&amp;#39;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SYS@t8icss1&amp;gt; select * from dbmgr.testlzl where a=&amp;#39;xx&amp;#39; and rownum&amp;lt;=1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;no rows selected
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Plan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Plan hash value: 2045386539
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| 0 | SELECT STATEMENT | | 1 | 2 | 302 (2)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|* 1 | COUNT STOPKEY | | | | | |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|* 2 | TABLE ACCESS FULL| TESTLZL | 1 | 2 | 302 (2)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Predicate Information (identified by operation id):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 1 - filter(ROWNUM&amp;lt;=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 2 - filter(&amp;#34;A&amp;#34;=&amp;#39;xx&amp;#39;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In Oracle, when &lt;code&gt;a='x'&lt;/code&gt; is found immediately, the STOPKEY cost is pushed into the inner node — cost is only 2. When the data doesn&amp;rsquo;t exist (&lt;code&gt;a='xx'&lt;/code&gt;), the full scan cost is 302.&lt;/p&gt;
&lt;p&gt;This is an important difference between Oracle and PostgreSQL cost calculation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In Oracle, the outer node cost is always ≥ the inner node cost; in PostgreSQL, this is not guaranteed.&lt;/li&gt;
&lt;li&gt;Oracle&amp;rsquo;s inner node cost incorporates outer operators (e.g., STOPKEY); PostgreSQL does not — it gives the full cost of the child path.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Oracle and Uneven Data Distribution
 &lt;div id="oracle-and-uneven-data-distribution" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#oracle-and-uneven-data-distribution" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Knowing the principle, we can reproduce the issue by placing data at the beginning of the sort index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tlzl(a char(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,b char(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Insert bulk data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; i &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;100000&lt;/span&gt; loop
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;test&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;test&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; loop;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;end&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Insert special data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;aaaa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aaaa&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;zzzz&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;zzzz&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Create indexes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_a &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_b &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl(b);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Collect statistics
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;EXEC&lt;/span&gt; DBMS_STATS.GATHER_TABLE_STATS(OWNNAME&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;SYS&amp;#39;&lt;/span&gt;,TABNAME&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;TLZL&amp;#39;&lt;/span&gt;,estimate_percent &lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;, degree&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,METHOD_OPT&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;FOR ALL COLUMNS SIZE AUTO&amp;#39;&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;cascade&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#75715e"&gt;/*+ index(tlzl idx_a)*/&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;aaaa&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; a) &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; rownum&lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#75715e"&gt;/*+ index(tlzl idx_a)*/&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;zzzz&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; a) &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; rownum&lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SYS@t8icss1&amp;gt; select * from (select /*+ index(tlzl idx_a)*/* from tlzl where b=&amp;#39;aaaa&amp;#39; order by a) where rownum&amp;lt;=1; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Plan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Plan hash value: 3674066029
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| 0 | SELECT STATEMENT | | 1 | 204 | 2210 (1)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|* 1 | COUNT STOPKEY | | | | | |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| 2 | VIEW | | 1 | 204 | 2210 (1)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|* 3 | TABLE ACCESS BY INDEX ROWID| TLZL | 1 | 202 | 2210 (1)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| 4 | INDEX FULL SCAN | IDX_A | 98830 | | 779 (1)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------------------------------------------&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SYS@t8icss1&amp;gt; select * from (select /*+ index(tlzl idx_a)*/* from tlzl where b=&amp;#39;zzzz&amp;#39; order by a) where rownum&amp;lt;=1; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Plan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Plan hash value: 3674066029
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| 0 | SELECT STATEMENT | | 1 | 204 | 2210 (1)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|* 1 | COUNT STOPKEY | | | | | |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| 2 | VIEW | | 1 | 204 | 2210 (1)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|* 3 | TABLE ACCESS BY INDEX ROWID| TLZL | 1 | 202 | 2210 (1)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| 4 | INDEX FULL SCAN | IDX_A | 98830 | | 779 (1)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------------------------------------------&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Oracle&amp;rsquo;s optimizer has the same limitation — it doesn&amp;rsquo;t know where the data actually sits within the index. Whether the data is at the first or last position in the index, the estimated cost is the same.&lt;/p&gt;
&lt;p&gt;However, Oracle provides more tools to address this: extended statistics, Automatic Column Group Detection, plan baselines, etc.&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="http://www.postgres.cn/v2/news/viewone/1/717" target="_blank" rel="noreferrer"&gt;http://www.postgres.cn/v2/news/viewone/1/717&lt;/a&gt;
&lt;a href="https://oracle-base.com/articles/12c/automatic-column-group-detection-extended-statistics-12cr1" target="_blank" rel="noreferrer"&gt;https://oracle-base.com/articles/12c/automatic-column-group-detection-extended-statistics-12cr1&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>PG Error: attempted to delete invisible tuple</title><link>https://lastdba.com/en/2024/08/12/pg-error-attempted-to-delete-invisible-tuple/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/pg-error-attempted-to-delete-invisible-tuple/</guid><description>&lt;h2 class="relative group"&gt;Problem Description
 &lt;div id="problem-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PostgreSQL DELETE was failing with &lt;code&gt;attempted to delete invisible tuple&lt;/code&gt;, but SELECT with the same conditions worked fine.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Results of full-table delete and full-table select:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;55000&lt;/span&gt;: attempted &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; invisible tuple
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: heap_delete, heapam.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2500&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;511&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;050&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;231187&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;DELETE found an invisible tuple, but SELECT was fine.&lt;/p&gt;
&lt;p&gt;This seemed very strange at first. PG visibility is determined by the tuple&amp;rsquo;s xmin, xmax, cid and the snapshot&amp;rsquo;s xmin, xmax, xip_list. Although the transaction state and timing of the tuple deletion can affect visibility, if the table data is stable (no ongoing DML), any subsequent snapshot should yield a stable visibility set. There shouldn&amp;rsquo;t be a case where the current transaction&amp;rsquo;s visibility differs from others — DML transaction tuple visibility should be consistent. In other words, in this scenario, the SELECT snapshot and DELETE snapshot shouldn&amp;rsquo;t produce different results.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Description
 &lt;div id="problem-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PostgreSQL DELETE was failing with &lt;code&gt;attempted to delete invisible tuple&lt;/code&gt;, but SELECT with the same conditions worked fine.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Results of full-table delete and full-table select:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;55000&lt;/span&gt;: attempted &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; invisible tuple
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: heap_delete, heapam.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2500&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;511&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;050&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;231187&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;DELETE found an invisible tuple, but SELECT was fine.&lt;/p&gt;
&lt;p&gt;This seemed very strange at first. PG visibility is determined by the tuple&amp;rsquo;s xmin, xmax, cid and the snapshot&amp;rsquo;s xmin, xmax, xip_list. Although the transaction state and timing of the tuple deletion can affect visibility, if the table data is stable (no ongoing DML), any subsequent snapshot should yield a stable visibility set. There shouldn&amp;rsquo;t be a case where the current transaction&amp;rsquo;s visibility differs from others — DML transaction tuple visibility should be consistent. In other words, in this scenario, the SELECT snapshot and DELETE snapshot shouldn&amp;rsquo;t produce different results.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Analysis
 &lt;div id="analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Finding the Source Code
 &lt;div id="finding-the-source-code" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#finding-the-source-code" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Note the error location: &lt;code&gt;heapam.c:2500&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Find the source at &lt;code&gt;src/backend/access/heap/heapam.c&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Line 2500 is blank; nearby code is:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Before locking the buffer, pin the visibility map page if it appears to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * be necessary. Since we haven&amp;#39;t got the lock yet, someone else might be
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * in the middle of changing this, so we&amp;#39;ll need to recheck after we have
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * the lock.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;PageIsAllVisible&lt;/span&gt;(page))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;visibilitymap_pin&lt;/span&gt;(relation, block, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;vmbuffer);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LockBuffer&lt;/span&gt;(buffer, BUFFER_LOCK_EXCLUSIVE);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From the source, it&amp;rsquo;s trying to acquire a lock on the VM, so the problem appears related to the VM file.&lt;/p&gt;

&lt;h3 class="relative group"&gt;The VM File
 &lt;div id="the-vm-file" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-vm-file" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;What is the VM file?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The VM (Visibility Map) file exists to reduce the time vacuum spends scanning pages. If a page doesn&amp;rsquo;t need vacuuming, it can be skipped, greatly reducing the time spent finding pages that need cleaning. This is the original purpose of the VM file. (It&amp;rsquo;s also sometimes used by index-only scans, but that doesn&amp;rsquo;t apply here since we&amp;rsquo;re doing a sequential scan.)&lt;/p&gt;
&lt;p&gt;The VM file stores two pieces of information:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Whether all tuples on a page are visible. This means the page has no dead tuples needing vacuum.&lt;/li&gt;
&lt;li&gt;Whether all tuples on a page are frozen. This means vacuum freeze doesn&amp;rsquo;t need to visit this page.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/1604296b876a.png" alt="Fig. 6.2. How the VM is used." /&gt;&lt;/p&gt;
&lt;p&gt;The VM helps vacuum find dead tuples while reducing the number of pages scanned. For example, in the diagram above (interdb ftw!), the first page contains no dead tuples, so vacuum can skip it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Finding the VM File&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Every table has a Visibility Map (VM) file (indexes don&amp;rsquo;t have VM files), stored alongside the table file. If a table&amp;rsquo;s filenode is &lt;code&gt;12345&lt;/code&gt;, its VM file is &lt;code&gt;12345_vm&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;First, cd to the data directory:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#75715e"&gt;# show data_directory;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data_directory 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; /pg/pg6666/data&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Find the file storage location using the database OID and table OID:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; oid,datname &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_database &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; datname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;sdp&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;17075&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; sdp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; oid,relname &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab1&amp;#39;&lt;/span&gt;; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;17362&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzltab1&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Or:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_relation_filepath 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17075&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17362&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Find the data file and VM:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cd /pg/pg6666/data/base/17075
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ ll 17362*
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;86761472&lt;/span&gt; Jun &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; 17:43 &lt;span style="color:#ae81ff"&gt;17362&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;40960&lt;/span&gt; Jun &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 17362_fsm
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; Nov &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2022&lt;/span&gt; 17362_vm&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;The pg_visibility Extension
 &lt;div id="the-pg_visibility-extension" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-pg_visibility-extension" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;pg_visibility&lt;/code&gt; provides page-level visibility information by inspecting VM files, and can detect VM corruption. Since the VM stores &amp;ldquo;are all tuples on this page visible; are all tuples on this page frozen&amp;rdquo; information, &lt;code&gt;pg_visibility&lt;/code&gt; can identify which pages are all-frozen and which are all-visible.&lt;/p&gt;
&lt;p&gt;pg_visibility extension reference: &lt;a href="https://www.postgresql.org/docs/current/pgvisibility.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/pgvisibility.html&lt;/a&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;Useful pg_visibility Functions
 &lt;div id="useful-pg_visibility-functions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#useful-pg_visibility-functions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;pg_visibility_map_summary()&lt;/strong&gt;: Shows the count of all-visible and all-frozen pages in the VM.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pg_check_frozen()&lt;/strong&gt;: Returns rows where a tuple is not frozen but its page is marked all-frozen in the VM. If this function returns results, the VM file is corrupt.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pg_check_visible()&lt;/strong&gt;: Returns rows where a tuple is not visible but its page is marked all-visible in the VM. If this function returns results, the VM file is corrupt.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pg_truncate_visibility_map()&lt;/strong&gt;: Clears the VM file. After clearing, the next vacuum on the table will scan all pages and rebuild the VM.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Repairing the VM File
 &lt;div id="repairing-the-vm-file" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#repairing-the-vm-file" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Check for VM corruption:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_visibility_map_summary(&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_visibility_map_summary 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;472&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;472 all-visible pages, 0 all-frozen pages.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_check_frozen(&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_check_frozen 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_check_visible(&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_check_visible 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;6839&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;6839&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;7296&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1423&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;pg_check_visible()&lt;/code&gt; returning results means &lt;strong&gt;the VM is corrupted&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Now use &lt;code&gt;pg_truncate_visibility_map()&lt;/code&gt; to clear the VM:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_truncate_visibility_map(&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_truncate_visibility_map 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;On disk, you can see the VM was cleared:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ll 17362*
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;86761472&lt;/span&gt; Jun &lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; 10:39 &lt;span style="color:#ae81ff"&gt;17362&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;40960&lt;/span&gt; Jun &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 17362_fsm
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; Jun &lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; 18:18 17362_vm&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now verify by vacuuming the table to regenerate the VM file and check it&amp;rsquo;s not corrupted:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;vacuum&lt;/span&gt; lzltab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;VACUUM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;3692&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;402&lt;/span&gt; ms (&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;692&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;q
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; ll &lt;span style="color:#ae81ff"&gt;17362&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 86761472 Jun 28 03:37 17362
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 40960 Jun 9 21:09 17362_fsm
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 8192 Jun 28 10:21 17362_vm&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After manual vacuum, the VM was regenerated correctly:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_check_visible(&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_check_visible 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_check_frozen(&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_check_frozen 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Both checks return empty — VM file is healthy. Repair complete.&lt;/p&gt;
&lt;p&gt;Finally, re-run the SQL:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DELETE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;229766&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;DELETE executes normally. Problem resolved.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Checking the Entire Database for VM Corruption
 &lt;div id="checking-the-entire-database-for-vm-corruption" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#checking-the-entire-database-for-vm-corruption" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Although we fixed one corrupted VM file, we should check the entire database for other VM corruption (requires the &lt;code&gt;pg_visibility&lt;/code&gt; extension installed):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; oid::regclass &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; relname
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_class
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; relkind &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;r&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;m&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;t&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;EXISTS&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_check_visible(oid))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXISTS&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_check_frozen(oid)));&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If results are returned, there&amp;rsquo;s VM corruption. Use &lt;code&gt;pg_truncate_visibility_map()&lt;/code&gt; to clear the VM, then vacuum to regenerate it, as shown above.&lt;/p&gt;
&lt;p&gt;For versions before 9.6 (which lack the pg_visibility extension), you&amp;rsquo;d need to stop the database, manually delete the VM files, restart, then vacuum to regenerate them.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Why Does VM Corruption Happen?
 &lt;div id="why-does-vm-corruption-happen" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-does-vm-corruption-happen" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;We traced the issue step by step to VM file corruption, but why did it corrupt?&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;PostgreSQL bugs. PG has had some bugs causing VM corruption (see Visibility Map Problems wiki), but these were all before PG 9.6.1.&lt;/li&gt;
&lt;li&gt;Operating system or hardware issues.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Our version was PG13, so the cause can only be broadly attributed to OS or hardware problems.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Why Did SELECT Succeed But DELETE Fail?
 &lt;div id="why-did-select-succeed-but-delete-fail" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-did-select-succeed-but-delete-fail" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;A full-table SELECT working while a full-table DELETE errors out seems bizarre. The root cause is VM file corruption.&lt;/p&gt;
&lt;p&gt;As mentioned, the VM file exists to speed up vacuum. Even though we weren&amp;rsquo;t running vacuum, the VM file still needs to be updated — DML operations always update (or at least check) the VM, while SELECT does not change VM state. So in this case, SELECT executed normally, but DELETE errored during VM processing.&lt;/p&gt;
&lt;p&gt;In our case, DELETE scanned the VM and found pages marked all-visible, but the VM was wrong — those pages still contained invisible tuples. This is exactly the &lt;code&gt;attempted to delete invisible tuple&lt;/code&gt; error. Invisible tuples may have already been deleted, and trying to delete them again naturally errors out, violating transaction visibility rules.&lt;/p&gt;
&lt;p&gt;Additionally, index-only scans also use the VM file, so they would also be affected. However, this case involved a sequential scan, so SELECT was unaffected.&lt;/p&gt;

&lt;h2 class="relative group"&gt;VM Corruption Causing Incorrect Index-Only Scan Results
 &lt;div id="vm-corruption-causing-incorrect-index-only-scan-results" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vm-corruption-causing-incorrect-index-only-scan-results" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;As mentioned earlier, besides vacuum, index-only scans also use the VM file. Even though our case didn&amp;rsquo;t involve index-only scans, let&amp;rsquo;s dig deeper for completeness.&lt;/p&gt;

&lt;h3 class="relative group"&gt;What Is an Index-Only Scan?
 &lt;div id="what-is-an-index-only-scan" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-an-index-only-scan" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;As the name suggests, an index-only scan accesses only the index structure to get results, without touching the table. Almost all relational databases support index-only scans because B+tree index structures store key values — if the query only needs key values, an index-only scan is possible.&lt;/p&gt;
&lt;p&gt;However, PostgreSQL&amp;rsquo;s transaction implementation differs significantly from other databases (Oracle, MySQL), giving its index-only scans some unique characteristics.&lt;/p&gt;
&lt;p&gt;PostgreSQL checks tuple visibility via xmin, xmax, and other information in tuple headers, but indexes don&amp;rsquo;t contain this information. This means PG&amp;rsquo;s index-only scans must visit data blocks to check visibility. This is where the VM comes in: since the VM stores all-visible and all-frozen information, pages marked as such don&amp;rsquo;t need visibility checks — the VM has already confirmed their visibility.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/63ed5a39f52d.png" alt="Fig. 7.7. How Index-Only Scans performs" /&gt;&lt;/p&gt;
&lt;p&gt;Another interdb diagram (interdb ftw!). When a query looks up tuples with keys 18 and 19: the page containing key=18 is marked all-visible in the VM, so accessing this tuple only requires the index page and VM file. The page containing key=19 is not marked all-visible, so the index-only scan still needs to visit the data page to check visibility.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Index-Only Scan Returning Incorrect Results
 &lt;div id="index-only-scan-returning-incorrect-results" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#index-only-scan-returning-incorrect-results" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Because index-only scans consult the VM, and a corrupted VM stores wrong information — e.g., a page&amp;rsquo;s tuples aren&amp;rsquo;t all visible (some may have been deleted), but the page is still marked all-visible — the index-only scan skips the data page visibility check and directly returns index key values that should be invisible.&lt;/p&gt;
&lt;p&gt;You can set &lt;code&gt;enable_indexonlyscan=off&lt;/code&gt; to disable index-only scans and guarantee correct results. Or, as shown above, repair the VM file — which is probably the better choice.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The journey had some twists: at first glance the error seemed like a transaction visibility rule problem, which would have been serious — but it was actually much simpler.&lt;/p&gt;
&lt;p&gt;We traced the &lt;code&gt;attempted to delete invisible tuple&lt;/code&gt; error to the source code, identified it as a VM issue, used the &lt;code&gt;pg_visibility&lt;/code&gt; extension to detect and fix the VM corruption, resolved the DELETE error, and finally explored the relationship between index-only scans and the VM.&lt;/p&gt;
&lt;p&gt;Key takeaways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;pg_visibility&lt;/code&gt; extension can read, check, and clear VM files&lt;/li&gt;
&lt;li&gt;Without VM information, vacuum will generate a new VM&lt;/li&gt;
&lt;li&gt;DML reads/updates VM files; SELECT does not (non-index-only-scan)&lt;/li&gt;
&lt;li&gt;The VM file exists to improve vacuum efficiency, and sometimes index-only scan efficiency&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;attempted to delete invisible tuple&lt;/code&gt; error warrants checking the VM file for corruption&lt;/li&gt;
&lt;li&gt;VM file corruption can cause DML failures and incorrect index-only scan results&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/pgvisibility.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/pgvisibility.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/Visibility_Map_Problems" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Visibility_Map_Problems&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql06.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql06.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql07.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql07.html&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>The Table I Wanted to Query Was Not in the Execution Plan</title><link>https://lastdba.com/en/2024/08/12/the-table-i-wanted-to-query-was-not-in-the-execution-plan/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/the-table-i-wanted-to-query-was-not-in-the-execution-plan/</guid><description>&lt;h2 class="relative group"&gt;Problem: The Queried Table Did Not Appear in the Execution Plan
 &lt;div id="problem-the-queried-table-did-not-appear-in-the-execution-plan" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-the-queried-table-did-not-appear-in-the-execution-plan" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;SQL:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.column1 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;column1&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- many A columns omitted in between
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.column99 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;column99&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; table_a A
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; table_a AA
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;inner&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; table_b BB &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; AA.lzl_key &lt;span style="color:#f92672"&gt;=&lt;/span&gt; BB.lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; AA.column_code &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;GROUP&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) B &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; B.lzl_id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; A.lzl_key
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.flagflagflag &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; A.typetypetype &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) TEMP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;offset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Execution plan:&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem: The Queried Table Did Not Appear in the Execution Plan
 &lt;div id="problem-the-queried-table-did-not-appear-in-the-execution-plan" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-the-queried-table-did-not-appear-in-the-execution-plan" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;SQL:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.column1 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;column1&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- many A columns omitted in between
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.column99 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;column99&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; table_a A
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; table_a AA
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;inner&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; table_b BB &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; AA.lzl_key &lt;span style="color:#f92672"&gt;=&lt;/span&gt; BB.lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; AA.column_code &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;GROUP&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) B &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; B.lzl_id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; A.lzl_key
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.flagflagflag &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; A.typetypetype &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) TEMP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;offset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Execution plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;68&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1105&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;038&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;039&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_a a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1105&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;036&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;037&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((flagflagflag)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((typetypetype)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; Removed &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; Filter: &lt;span style="color:#ae81ff"&gt;38&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;184&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;066&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;As you can see, the SQL itself is fairly complex. Logically, the SQL queries 3 tables / accesses 2 tables total. I can understand &lt;code&gt;table_a&lt;/code&gt; appearing in the execution plan, but &lt;code&gt;table_b&lt;/code&gt;, which needed to be queried, wasn&amp;rsquo;t in the execution plan at all! The execution plan was simply a sequential scan of &lt;code&gt;table_a&lt;/code&gt;.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Analytical Journey
 &lt;div id="the-analytical-journey" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-analytical-journey" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;In the middle of the analysis, I actually considered many possibilities, but the most likely one was logical optimization — that is, the PostgreSQL optimizer determined that &lt;code&gt;table_b&lt;/code&gt; didn&amp;rsquo;t need to be queried.&lt;/p&gt;
&lt;p&gt;Observing the SQL, I noticed that the final query only selected columns from &lt;code&gt;table_a&lt;/code&gt;, without any columns from &lt;code&gt;table_b&lt;/code&gt;. Adding any column from the intermediate table B made the SQL execution plan appear &amp;ldquo;normal&amp;rdquo; — it accessed &lt;code&gt;table_b&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.column1 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;column1&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- many A columns omitted in between
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.column99 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;column99&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; B.lzl_id &lt;span style="color:#75715e"&gt;-- added a column from intermediate table B
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; table_a A
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; table_a AA
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;inner&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; table_b BB &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; AA.lzl_key &lt;span style="color:#f92672"&gt;=&lt;/span&gt; BB.lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; AA.column_code &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;GROUP&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) B &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; B.lzl_id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; A.lzl_key
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.flagflagflag &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; A.typetypetype &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) TEMP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;offset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;67&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1113&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Nested Loop &lt;span style="color:#66d9ef"&gt;Left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;72&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1113&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; Filter: (bb.lzl_id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; a.lzl_key)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_a a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1113&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((flagflagflag)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((typetypetype)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;72&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;74&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: bb.lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;72&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;73&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: bb.lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Nested Loop (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_a aa (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;70&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((company_code)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_table_b_lzl_id &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_b bb (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;83&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (lzl_id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; aa.lzl_key)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This seems related to LEFT JOIN, but a quick thought makes it seem incorrect — after all, the results from the right table should affect the final query result, so the right table shouldn&amp;rsquo;t be skipped. Let&amp;rsquo;s try a simple LEFT JOIN:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlleft.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; lzlright &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;lzlright.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash &lt;span style="color:#66d9ef"&gt;Left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;320&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (lzlleft.a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; lzlright.a)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;320&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlright (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The right table is scanned. But, in intermediate table B, there&amp;rsquo;s the keyword &lt;code&gt;GROUP BY&lt;/code&gt;. If we remove &lt;code&gt;GROUP BY&lt;/code&gt;, then &lt;code&gt;table_b&lt;/code&gt; is accessed regardless of whether we query columns from B.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s add a GROUP BY in our test table and see the result:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zzz
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;259&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlright;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; qwer
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; poiuy 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlright.b &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;full&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; lzlright &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;lzlright.b &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lzlright.b;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; poiuy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; qwer
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This is where I realized that the result set from GROUP BY must have a certain property — &lt;strong&gt;uniqueness&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s add GROUP BY in the test table:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlleft.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlright &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; a) &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;320&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The right table is not queried!&lt;/p&gt;
&lt;p&gt;Based on the principle of right-table uniqueness, we can also have some fun variations:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- distinct ensures right-table uniqueness
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlleft.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;distinct&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlright) &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;320&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- unique index ensures right-table uniqueness, even with just select a from lzlright
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; explain select lzlleft.a from lzlleft left join (select a from lzlright) c on lzlleft.a=c.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-----------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Left Join (cost=17.20..49.12 rows=512 width=4)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (lzlleft.a = lzlright.a)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Seq Scan on lzlleft (cost=0.00..13.20 rows=320 width=4)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Hash (cost=13.20..13.20 rows=320 width=4)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Seq Scan on lzlright (cost=0.00..13.20 rows=320 width=4)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(5 rows)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: 0.510 ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; create unique index idx_right on lzlright(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CREATE INDEX
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: 3.576 ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; explain select lzlleft.a from lzlleft left join (select a from lzlright) c on lzlleft.a=c.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Seq Scan on lzlleft (cost=0.00..13.20 rows=320 width=4)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(1 row)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Here&amp;rsquo;s a summary of the analysis: when the right table&amp;rsquo;s data is unique and only the left table&amp;rsquo;s data is being queried, there&amp;rsquo;s no need to actually access the right table. So this is not a bug, but a feature of the PostgreSQL optimizer — and it makes logical sense.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Source Code Analysis
 &lt;div id="source-code-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#source-code-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;No source code analysis this time~&lt;/p&gt;
&lt;p&gt;The optimizer source code is just too difficult. I only looked at some optimizer source code comments. Search for the keyword &lt;code&gt;unique-ify&lt;/code&gt;, and you&amp;rsquo;ll find this:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; Also, this routine and others in this module accept the special JoinTypes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to indicate that we should
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; unique&lt;span style="color:#f92672"&gt;-&lt;/span&gt;ify the outer or inner relation and then apply a regular inner
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; join. These values are not allowed to propagate outside this module,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; however. Path cost estimation code may need to recognize that it&lt;span style="color:#960050;background-color:#1e0010"&gt;&amp;#39;&lt;/span&gt;s
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; dealing with such a &lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; &lt;span style="color:#f92672"&gt;---&lt;/span&gt; the combination of nominal jointype INNER
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; with sjinfo&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;jointype &lt;span style="color:#f92672"&gt;==&lt;/span&gt; JOIN_SEMI indicates that. 
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Special JoinTypes: &lt;code&gt;JOIN_UNIQUE_INNER&lt;/code&gt; and &lt;code&gt;JOIN_UNIQUE_OUTER&lt;/code&gt; — they try to unique-ify the outer and inner relations and then treat them as an inner join. Path cost estimation needs to consider this scenario.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Comparison with Oracle and MySQL Optimizers
 &lt;div id="comparison-with-oracle-and-mysql-optimizers" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#comparison-with-oracle-and-mysql-optimizers" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s compare whether Oracle and MySQL optimizers have similar logical optimization improvements.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Oracle
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlleft(a number);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlright(a number);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlleft.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;distinct&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlright) &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.a;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- GROUP BY uniqueness
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlleft.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlright &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; a) &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.a; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;no&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt; selected
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Plan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Plan hash value: &lt;span style="color:#ae81ff"&gt;3533354041&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; Id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Operation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Bytes &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Cost (&lt;span style="color:#f92672"&gt;%&lt;/span&gt;CPU)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; Time &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;STATEMENT&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;26&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HASH &lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OUTER&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;26&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ACCESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LZLLEFT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VIEW&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HASH &lt;span style="color:#66d9ef"&gt;GROUP&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ACCESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FULL&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; LZLRIGHT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Predicate Information (identified &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;operation&lt;/span&gt; id):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;access&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;LZLLEFT&amp;#34;&lt;/span&gt;.&lt;span style="color:#e6db74"&gt;&amp;#34;A&amp;#34;&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;.&lt;span style="color:#e6db74"&gt;&amp;#34;A&amp;#34;&lt;/span&gt;(&lt;span style="color:#f92672"&gt;+&lt;/span&gt;))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- DISTINCT uniqueness
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlleft.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;distinct&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlright) &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;no&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt; selected
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Plan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Plan hash value: &lt;span style="color:#ae81ff"&gt;3859658234&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; Id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Operation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Bytes &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Cost (&lt;span style="color:#f92672"&gt;%&lt;/span&gt;CPU)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; Time &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;STATEMENT&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;26&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HASH &lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OUTER&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;26&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ACCESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LZLLEFT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VIEW&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HASH &lt;span style="color:#66d9ef"&gt;UNIQUE&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ACCESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FULL&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; LZLRIGHT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Predicate Information (identified &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;operation&lt;/span&gt; id):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;access&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;LZLLEFT&amp;#34;&lt;/span&gt;.&lt;span style="color:#e6db74"&gt;&amp;#34;A&amp;#34;&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;.&lt;span style="color:#e6db74"&gt;&amp;#34;A&amp;#34;&lt;/span&gt;(&lt;span style="color:#f92672"&gt;+&lt;/span&gt;))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- MySQL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlleft(a int &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlright(a int &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- GROUP BY uniqueness
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlleft.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlright &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; a) &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.a; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; select_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; partitions &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; possible_keys &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; key_len &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; filtered &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Extra &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlleft &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;derived2&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;auto_key0&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;auto_key0&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb.lzlleft.a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; DERIVED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlright &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- DISTINCT uniqueness
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlleft.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;distinct&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlright) &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; select_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; partitions &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; possible_keys &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; key_len &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; filtered &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Extra &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlleft &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;derived2&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;auto_key0&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;auto_key0&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb.lzlleft.a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; DERIVED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlright &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In summary, neither Oracle nor MySQL performs the optimization of eliminating the right table in a LEFT JOIN when only left-table columns are queried and the right table is unique — they both access the right table.&lt;/p&gt;
&lt;p&gt;The PostgreSQL optimizer really has some impressive tricks.&lt;/p&gt;</content:encoded></item><item><title>Too Many Range Table Entries Even with Not-That-Many Partitions</title><link>https://lastdba.com/en/2024/08/12/too-many-range-table-entries-even-with-not-that-many-partitions/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/too-many-range-table-entries-even-with-not-that-many-partitions/</guid><description>&lt;h2 class="relative group"&gt;Problem Description
 &lt;div id="problem-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PostgreSQL UPDATE statement throws error: &lt;code&gt;too many range table entries&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Original SQL:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; LZLTAB &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt;	LZLTAB &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	STATUS &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;00&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	FILE_ID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;localtimestamp&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;	id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If we rewrite UPDATE as SELECT, it succeeds:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;	id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;	LZLTAB &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;	id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; 	LZLTAB &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;	id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id	&lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_created 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+----------------------------+...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;06&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;161687&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)	&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Primary key and partitions — 400 partitions total:&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Description
 &lt;div id="problem-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PostgreSQL UPDATE statement throws error: &lt;code&gt;too many range table entries&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Original SQL:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; LZLTAB &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt;	LZLTAB &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	STATUS &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;00&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	FILE_ID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;localtimestamp&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;	id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If we rewrite UPDATE as SELECT, it succeeds:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;	id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;	LZLTAB &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;	id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; 	LZLTAB &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;	id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id	&lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_created 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+----------------------------+...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;06&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;161687&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)	&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Primary key and partitions — 400 partitions total:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;: RANGE (partition_key)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;pk_lzl&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id, partition_key)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: lzl_p20230601 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230601&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230602&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_p20230602 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230602&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230603&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_p20230603 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230603&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230604&amp;#39;&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The SQL logic has many optimization opportunities, but we won&amp;rsquo;t discuss those here. The focus is on why UPDATE fails and why SELECT and UPDATE behave differently.&lt;/p&gt;
&lt;p&gt;EXPLAIN UPDATE throws this error:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (selec tid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; LZLTAB &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; LZLTAB &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;STATUS &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;00&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;FILE_ID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;localtimestamp&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;54000&lt;/span&gt;: too many range &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; entries
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: add_rte_to_flat_rtable, setrefs.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;451&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;18341&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;171&lt;/span&gt; ms (&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;341&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;EXPLAIN took 18 seconds, then threw the error.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Source Code Analysis
 &lt;div id="source-code-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#source-code-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The error directly points to the source location: &lt;code&gt;LOCATION: add_rte_to_flat_rtable, setrefs.c:451&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Find the source at &lt;code&gt;src/backend/optimizer/plan/setrefs.c&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The comment explains that setrefs.c handles post-processing of a completed plan tree:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *Post-processing of a completed plan tree: fix references to subplan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 vars, compute regproc values for operators, etc
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Find the function at line 451:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Add (a copy of) the given RTE to the final rangetable
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * In the flat rangetable, we zero out substructure pointers that are not
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * needed by the executor; this reduces the storage space and copying cost
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * for cached plans. We keep only the ctename, alias and eref Alias fields,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * which are needed by EXPLAIN, and the selectedCols, insertedCols,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * updatedCols, and extraUpdatedCols bitmaps, which are needed for
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * executor-startup permissions checking and for trigger event checking.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;add_rte_to_flat_rtable&lt;/span&gt;(PlannerGlobal &lt;span style="color:#f92672"&gt;*&lt;/span&gt;glob, RangeTblEntry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rte)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Check for RT index overflow; it&amp;#39;s very unlikely, but if it did happen,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * the executor would get confused by varnos that match the special varno
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * values.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IS_SPECIAL_VARNO&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;list_length&lt;/span&gt;(glob&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;finalrtable)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(ERROR,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(&lt;span style="color:#a6e22e"&gt;errcode&lt;/span&gt;(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;too many range table entries&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;errmsg() is at line 451. From the comments, add_rte_to_flat_rtable() is related to RTE. What is RTE? We&amp;rsquo;ll analyze below.&lt;/p&gt;
&lt;p&gt;The error check uses &lt;code&gt;IS_SPECIAL_VARNO()&lt;/code&gt;. Searching for this macro in &lt;code&gt;src/include/nodes/primnodes.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Var - expression node representing a variable (ie, a table column)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * In the parser and planner, varno and varattno identify the semantic
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * referent, which is a base-relation column unless the reference is to a join
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * USING column that isn&amp;#39;t semantically equivalent to either join input column
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * (because it is a FULL join or the input column requires a type coercion).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * In those cases varno and varattno refer to the JOIN RTE. (Early in the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * planner, we replace such join references by the implied expression; but up
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * till then we want join reference Vars to keep their original identity for
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * query-printing purposes.)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define INNER_VAR		65000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* reference to inner subplan */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define OUTER_VAR		65001	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* reference to outer subplan */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define INDEX_VAR		65002	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* reference to index column */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define IS_SPECIAL_VARNO(varno)		((varno) &amp;gt;= INNER_VAR)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The comment above is a bit dense, but one phrase is key: &lt;em&gt;In those cases varno and varattno refer to the JOIN RTE&lt;/em&gt;. varno is related to RTE.&lt;/p&gt;
&lt;p&gt;When &lt;code&gt;varno&amp;gt;=65000&lt;/code&gt;, the error is thrown. (We won&amp;rsquo;t go into the differences between &lt;code&gt;INNER_VAR&lt;/code&gt;, &lt;code&gt;OUTER_VAR&lt;/code&gt;, and &lt;code&gt;INDEX_VAR&lt;/code&gt; here since their values are close and don&amp;rsquo;t affect the analysis.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What is RTE?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Descriptions of RTE (rangetable or RangeTblEntry) can be found throughout the execution plan source code, and the error is clear: &lt;code&gt;ERROR: 54000: too many range table entries&lt;/code&gt; — it&amp;rsquo;s about RTE. So what is RTE?&lt;/p&gt;
&lt;p&gt;In &lt;code&gt;src/include/nodes/parsenodes.h&lt;/code&gt;, there&amp;rsquo;s a description of RTE:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*--------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * RangeTblEntry -
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 A range table is a List of RangeTblEntry nodes.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 A range table entry may represent a plain relation, a sub-select in
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 FROM, or the result of a JOIN clause. (Only explicit JOIN syntax
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 produces an RTE, not the implicit join resulting from multiple FROM
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 items. This is because we only need the RTE to deal with SQL features
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 like outer joins and join-output-column aliasing.) Other special
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 RTE types also exist, as indicated by RTEKind.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 Note that we consider RTE_RELATION to cover anything that has a pg_class
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 entry. relkind distinguishes the sub-cases.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Simply put, an RTE is a &amp;ldquo;table&amp;rdquo; in the execution plan — it can be a concrete table or a generated &amp;ldquo;table&amp;rdquo; like a subquery, join result, etc. The RTE limit of 65000 means too many RTEs were generated in the execution plan.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Viewing the UPDATE Execution Plan
 &lt;div id="viewing-the-update-execution-plan" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#viewing-the-update-execution-plan" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Since we now know what RTE is, looking at the SQL execution plan may help. But since the original SQL (400 partitions) couldn&amp;rsquo;t generate an execution plan, let&amp;rsquo;s create a 30-partition table and hopefully EXPLAIN it to observe the plan.&lt;/p&gt;
&lt;p&gt;30-partition table with the same UPDATE statement:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;STATUS &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;00&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;FILE_ID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;localtimestamp&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ( &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Generated execution plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;4980&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;600&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3042&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230601 lzl_1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230602 lzl_2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230630 lzl_30
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash Semi &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;166&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3042&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (lzl_1.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; t.id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230601 lzl_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2912&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Subquery Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;154&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;154&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230601_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230601 lzl_32 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230602_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230602 lzl_33 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230630_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230630 lzl_61 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash Semi &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;166&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3042&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (lzl_30.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; t_29.id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230630 lzl_30 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2912&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Subquery Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t_29 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;154&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;154&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230601_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230601 lzl_931 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230602_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230602 lzl_932 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230630_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230630 lzl_960 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;2041&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The execution plan is extremely long — 2041 rows in total. This plan is very inefficient: every time a partition is updated, the predicate conditions are run against the partitioned table all over again. Since the SQL lacks a partition key, each run scans all partitions. For a 30-partition table, each partition is scanned 30 times, totaling 900 partition scans.&lt;/p&gt;
&lt;p&gt;From the execution plan, we can see that initially 30 RTEs were allocated for UPDATE up to lzl_30. Then each hash match per partition scan also allocated 30 RTEs — for example, the hash under lzl_1 has partition scans from lzl_32 to lzl_61. Why 32 instead of 31? Because the entire partition scan is a subquery and also an RTE, named t (and t, t1-t_29), totaling 30. So the total RTEs generated in the plan are 30+30+30×30=960.&lt;/p&gt;
&lt;p&gt;Looking at the SELECT execution plan, it&amp;rsquo;s very different from UPDATE:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; STATUS ,FILE_ID ,DATE_UPDATED &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ( &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Semi &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;467&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;90&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;98&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (lzl.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; lzl_31.id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;309&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;600&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;106&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230601 lzl_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;106&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230602 lzl_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;106&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230630 lzl_30 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;106&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;154&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;154&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230601_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230601 lzl_32 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230602_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230602 lzl_33 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230630_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230630 lzl_61 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;96&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;No repeated (Cartesian product-style) table access — RTEs only go up to 61. This is also why SELECT succeeds on 400 partitions, because 400×400 accesses is simply too many.&lt;/p&gt;
&lt;p&gt;So regarding the original SQL where UPDATE fails and SELECT succeeds, we can conclude:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For 400 partitions with SELECT, the execution plan has 801 RTEs, which doesn&amp;rsquo;t exceed &lt;code&gt;INNER_VAR&lt;/code&gt; (65000), so it can generate a plan and execute.&lt;/li&gt;
&lt;li&gt;For 400 partitions with UPDATE, the execution plan has 160,160,400 RTEs, far exceeding &lt;code&gt;INNER_VAR&lt;/code&gt; (65000), so the plan cannot be generated and throws the RTE overflow error.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The cause is mostly analyzed, but the significant difference between SELECT and UPDATE plans is still puzzling. Let&amp;rsquo;s compare Oracle and MySQL execution plans horizontally.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Oracle Behavior
 &lt;div id="oracle-behavior" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#oracle-behavior" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Oracle partitioned table with local index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; lzl (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id number &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; partition_key number &lt;span style="color:#66d9ef"&gt;DEFAULT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PARTITION &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; RANGE (partition_key)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PARTITION lzl_p20230601 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;THAN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230602&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PARTITION lzl_p20230602 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;THAN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230603&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PARTITION lzl_p20230630 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;THAN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230631&amp;#39;&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; PKLZL &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl(id, partition_key) &lt;span style="color:#66d9ef"&gt;local&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; pklzl &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; (id, partition_key) &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; pklzl;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Execution plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; rownum&lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; STATUS ,FILE_ID ,DATE_UPDATED &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ( &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e6b4077b9290.png" alt="image.png" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;STATUS &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;00&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;FILE_ID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; sysdate
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; rownum&lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/35e2fc036d9f.png" alt="image.png" /&gt;&lt;/p&gt;
&lt;p&gt;In Oracle, both SELECT and UPDATE use NESTED LOOP, accessing all partitions (PARTITION RANGE ALL). So in Oracle, regardless of SELECT or UPDATE, table t is the driving table. Because of IN, results are sorted and deduplicated. So Oracle&amp;rsquo;s plan is not 30×30 accesses but depends on the result set size in the driving table — n rows means n×30 partition accesses. Since driving table t has minimal data, this plan is fine.&lt;/p&gt;

&lt;h3 class="relative group"&gt;MySQL Behavior
 &lt;div id="mysql-behavior" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#mysql-behavior" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since MySQL only supports local indexes, just create the primary key directly:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; test (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id bigint &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_created &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; ,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PARTITION &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; RANGE (partition_key) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PARTITION lzl_p20230601 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;THAN&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;20230602&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PARTITION lzl_p20230602 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;THAN&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;20230603&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PARTITION lzl_p20230630 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;THAN&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;20230631&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; pklzl(id,partition_key);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;MySQL starting from 5.7 shows which partitions are scanned in the execution plan (version 8.0 here).&lt;/p&gt;
&lt;p&gt;SELECT plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; STATUS ,FILE_ID ,DATE_UPDATED &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ( &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+---------+---------+-------+------+----------+-----------------+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; select_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; partitions &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; possible_keys &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; key_len &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; filtered &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Extra &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+---------+---------+-------+------+----------+-----------------+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;derived3&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Start&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;temporary&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl_p20230601,lzl_p20230602,lzl_p20230603,lzl_p20230604,lzl_p20230605,lzl_p20230606,lzl_p20230607,lzl_p20230608,lzl_p20230609,lzl_p20230610,lzl_p20230611,lzl_p20230612,lzl_p20230613,lzl_p20230614,lzl_p20230615,lzl_p20230616,lzl_p20230617,lzl_p20230618,lzl_p20230619,lzl_p20230620,lzl_p20230621,lzl_p20230622,lzl_p20230623,lzl_p20230624,lzl_p20230625,lzl_p20230626,lzl_p20230627,lzl_p20230628,lzl_p20230629,lzl_p20230630 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t.id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;End&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;temporary&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; DERIVED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl_p20230601,lzl_p20230602,lzl_p20230603,lzl_p20230604,lzl_p20230605,lzl_p20230606,lzl_p20230607,lzl_p20230608,lzl_p20230609,lzl_p20230610,lzl_p20230611,lzl_p20230612,lzl_p20230613,lzl_p20230614,lzl_p20230615,lzl_p20230616,lzl_p20230617,lzl_p20230618,lzl_p20230619,lzl_p20230620,lzl_p20230621,lzl_p20230622,lzl_p20230623,lzl_p20230624,lzl_p20230625,lzl_p20230626,lzl_p20230627,lzl_p20230628,lzl_p20230629,lzl_p20230630 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; const &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;UPDATE plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; STATUS &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;00&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; FILE_ID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;localtimestamp&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ( &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+---------+---------+-------+------+----------+-----------------+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; select_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; partitions &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; possible_keys &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; key_len &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; filtered &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Extra &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+---------+---------+-------+------+----------+-----------------+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;derived3&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Start&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;temporary&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl_p20230601,lzl_p20230602,lzl_p20230603,lzl_p20230604,lzl_p20230605,lzl_p20230606,lzl_p20230607,lzl_p20230608,lzl_p20230609,lzl_p20230610,lzl_p20230611,lzl_p20230612,lzl_p20230613,lzl_p20230614,lzl_p20230615,lzl_p20230616,lzl_p20230617,lzl_p20230618,lzl_p20230619,lzl_p20230620,lzl_p20230621,lzl_p20230622,lzl_p20230623,lzl_p20230624,lzl_p20230625,lzl_p20230626,lzl_p20230627,lzl_p20230628,lzl_p20230629,lzl_p20230630 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t.id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;End&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;temporary&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; DERIVED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl_p20230601,lzl_p20230602,lzl_p20230603,lzl_p20230604,lzl_p20230605,lzl_p20230606,lzl_p20230607,lzl_p20230608,lzl_p20230609,lzl_p20230610,lzl_p20230611,lzl_p20230612,lzl_p20230613,lzl_p20230614,lzl_p20230615,lzl_p20230616,lzl_p20230617,lzl_p20230618,lzl_p20230619,lzl_p20230620,lzl_p20230621,lzl_p20230622,lzl_p20230623,lzl_p20230624,lzl_p20230625,lzl_p20230626,lzl_p20230627,lzl_p20230628,lzl_p20230629,lzl_p20230630 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; const &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;MySQL&amp;rsquo;s two execution plans are identical. However, the driving table selection could be better — const should be the driving table to reduce scan count.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Bug?
 &lt;div id="bug" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#bug" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Bug Description
 &lt;div id="bug-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#bug-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/list/thread-id/2482006" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/list/thread-id/2482006&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This bug is easy to find via the error. It was submitted by digoal (德哥) back in 2020, followed by discussion between two source code experts. The discussion is lengthy, but to summarize: PG does not support unlimited partitions, which is understandable in the real world — too many partitions can cause rapid performance degradation. However, the community still felt the limit needed adjustment and discussed the &lt;code&gt;INNER_VAR&lt;/code&gt;, &lt;code&gt;Var.varno&lt;/code&gt; values in the source code.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Misleading Nature
 &lt;div id="misleading-nature" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#misleading-nature" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The bug title is somewhat misleading: &lt;em&gt;BUG #16302: too many range table entries - when count partition table(65538 childs)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The bug seems to say the number of partition child tables can&amp;rsquo;t exceed 65,538. The discussion also mentions &lt;em&gt;PG can handle up to 64K relations in a query&lt;/em&gt; — a query cannot have more than 64K relations.&lt;/p&gt;
&lt;p&gt;This is odd because our table has 400 partitions and still throws the error. In fact, both descriptions above are not entirely accurate. The 64K limit refers to the &amp;ldquo;tables&amp;rdquo; in the execution plan, which doesn&amp;rsquo;t exactly equal real tables. Of course, if tables or partitions exceed this count, there will be problems. But even without exceeding 64K, issues can arise, as in our case with only 400 partitions.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Fix
 &lt;div id="fix" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fix" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The bug was submitted for version 12.2; our environment is 13.2.&lt;/p&gt;
&lt;p&gt;This bug is fixed in PG15. The source in &lt;code&gt;src/include/nodes/primnodes.h&lt;/code&gt; is different:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define INNER_VAR		(-1)	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* reference to inner subplan */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define OUTER_VAR		(-2)	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* reference to outer subplan */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define INDEX_VAR		(-3)	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* reference to index column */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define ROWID_VAR		(-4)	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* row identity column during planning */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define IS_SPECIAL_VARNO(varno)		((int) (varno) &amp;lt; 0)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;As discussed in the community, PG15 not only changed VAR values to negative numbers but also converted varno to 32-bit (4 billion), compared to the previous 16-bit (65,536).&lt;/p&gt;
&lt;p&gt;And in the function that previously threw the error, &lt;code&gt;add_rte_to_flat_rtable()&lt;/code&gt; in &lt;code&gt;src/backend/optimizer/plan/setrefs.c&lt;/code&gt;, the error code has been completely removed! The entire PG15 source code no longer contains &lt;code&gt;too many range table entries&lt;/code&gt;!&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;PG still has room for improvement in partitioned table optimization. PG treats child partitions as regular tables, unlike Oracle and MySQL. Oracle treats child partitions as segments distinct from tables. This causes PG to output the access method for every partition in the execution plan (when pruning doesn&amp;rsquo;t occur), making plans extremely long when there are many partitions. Oracle just writes &lt;code&gt;PARTITION RANGE ALL&lt;/code&gt;. MySQL also prints all partitions but doesn&amp;rsquo;t treat each partition&amp;rsquo;s access as a subquery, reducing plan complexity.&lt;/li&gt;
&lt;li&gt;Even when partitions haven&amp;rsquo;t reached 64K, you can still get &lt;code&gt;too many range table entries&lt;/code&gt;. This limit is actually on execution plan RTE count, not partition count (though if partition count reaches this number, RTE count will too, as mentioned — PG prints access methods for all partitions).&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;too many range table entries&lt;/code&gt; error is resolved in PG15.&lt;/li&gt;
&lt;li&gt;For versions below 15, don&amp;rsquo;t create too many partitions! You can also leverage partition pruning to reduce accessed partitions — in this case, simply adding a partition key condition to the WHERE clause would work.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Why Is 'partition of' Slow When There's No Blocking?</title><link>https://lastdba.com/en/2024/08/12/why-is-partition-of-slow-when-theres-no-blocking/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/why-is-partition-of-slow-when-theres-no-blocking/</guid><description>&lt;h4 class="relative group"&gt;Analyzing Slow &lt;code&gt;CREATE TABLE.. PARTITION OF&lt;/code&gt; Statements
 &lt;div id="analyzing-slow-create-table-partition-of-statements" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analyzing-slow-create-table-partition-of-statements" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;063&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;user1&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;dblzl&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;125889&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.79.3:37423&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;66461213&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;ebc1,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;authentication&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41364668&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;connection authorized: user=user1 database=dblzl&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;079&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;user1&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;dblzl&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;125889&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.79.3:37423&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;66461213&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;ebc1,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41364669&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;statement: -- a86fae372f73414bbe1af18213a47beb
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;/*a86fae372f73414bbe1af18213a47beb */
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;create table if not exists table1_partition_p2406 partition of table1 for values from (&amp;#39;2024-06-01 00:00:00&amp;#39;) to (&amp;#39;2024-07-01 00:00:00&amp;#39;); &amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;38&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;555&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;user1&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;dblzl&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;125889&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.79.3:37423&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;66461213&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;ebc1,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;CREATE TABLE&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;duration: 2129483.549 ms&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The user &amp;lsquo;user1&amp;rsquo; connected to the database at 22:02:59 and immediately executed a &lt;code&gt;create table.. partition of..&lt;/code&gt; statement, which didn&amp;rsquo;t complete until 22:38:28. The logs in between are omitted — there was a lot of session blocking information, with session 125889 as the blocking source.&lt;/p&gt;</description><content:encoded>
&lt;h4 class="relative group"&gt;Analyzing Slow &lt;code&gt;CREATE TABLE.. PARTITION OF&lt;/code&gt; Statements
 &lt;div id="analyzing-slow-create-table-partition-of-statements" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analyzing-slow-create-table-partition-of-statements" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;063&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;user1&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;dblzl&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;125889&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.79.3:37423&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;66461213&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;ebc1,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;authentication&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41364668&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;connection authorized: user=user1 database=dblzl&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;079&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;user1&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;dblzl&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;125889&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.79.3:37423&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;66461213&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;ebc1,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41364669&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;statement: -- a86fae372f73414bbe1af18213a47beb
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;/*a86fae372f73414bbe1af18213a47beb */
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;create table if not exists table1_partition_p2406 partition of table1 for values from (&amp;#39;2024-06-01 00:00:00&amp;#39;) to (&amp;#39;2024-07-01 00:00:00&amp;#39;); &amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;38&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;555&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;user1&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;dblzl&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;125889&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.79.3:37423&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;66461213&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;ebc1,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;CREATE TABLE&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;duration: 2129483.549 ms&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The user &amp;lsquo;user1&amp;rsquo; connected to the database at 22:02:59 and immediately executed a &lt;code&gt;create table.. partition of..&lt;/code&gt; statement, which didn&amp;rsquo;t complete until 22:38:28. The logs in between are omitted — there was a lot of session blocking information, with session 125889 as the blocking source.&lt;/p&gt;
&lt;p&gt;Blocked sessions looked like:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;process &lt;span style="color:#ae81ff"&gt;33569&lt;/span&gt; still waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; RowExclusiveLock &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; relation &lt;span style="color:#ae81ff"&gt;53733&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;17073&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;after&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;048&lt;/span&gt; ms&lt;span style="color:#e6db74"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;Process holding the &lt;span style="color:#66d9ef"&gt;lock&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;125889&lt;/span&gt;. Wait queue: &lt;span style="color:#ae81ff"&gt;33569&lt;/span&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When &lt;code&gt;PARTITION OF&lt;/code&gt; adds a partition, it acquires an AccessExclusiveLock (level 8) on the parent table, which blocks all operations on the partitioned table. Normally, adding a partition via &lt;code&gt;PARTITION OF&lt;/code&gt; is very fast, and the lock is released immediately. However, if there&amp;rsquo;s a long-running transaction on the partitioned table, the level 8 lock on the parent table must wait, causing subsequent blocking.&lt;/p&gt;
&lt;p&gt;(Stolen from &lt;a href="https://blog.csdn.net/qq_40687433/article/details/132525655" target="_blank" rel="noreferrer"&gt;my own diagram&lt;/a&gt;):



&lt;img src="https://lastdba.com/img/csdn/6c7f70fc3b60.png" alt="diagram" /&gt;&lt;/p&gt;
&lt;p&gt;However, in this case there was no long transaction on the table, yet &lt;code&gt;PARTITION OF&lt;/code&gt; took 35 minutes.&lt;/p&gt;
&lt;p&gt;From historical process information, this process was in D state (uninterruptible sleep), which was suspicious. Initially, I suspected memory or disk issues, but after investigation, everything was normal.&lt;/p&gt;
&lt;p&gt;However, this problem was easy to reproduce — running &lt;code&gt;create table partition of&lt;/code&gt; directly in a simulation environment was very slow. pg_stat_activity showed the statement waiting on IO:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; DataFileRead
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; xxx partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; xx &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2025-05-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2025-06-01 00:00:00&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;strace tracing revealed the process was heavily reading one file:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pread64(&lt;span style="color:#ae81ff"&gt;53&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\22\2\0\0\220w\321&amp;gt;\0\0\5\0\24\0018\1\0 \4 \0\0\0\0\200\237\0\1\310\236p\1&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;863485952&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Using file descriptor 53, we identified the file:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#f92672"&gt;/&lt;/span&gt;proc&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;356174&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;fd] ll &lt;span style="color:#f92672"&gt;|&lt;/span&gt;grep &lt;span style="color:#ae81ff"&gt;53&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lrwx&lt;span style="color:#75715e"&gt;------ 1 postgres postgres 64 May 17 15:34 53 -&amp;gt; /lzl/pglzl/data/base/17076/25883&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oid2name &lt;span style="color:#f92672"&gt;-&lt;/span&gt;d lzldb &lt;span style="color:#f92672"&gt;-&lt;/span&gt;f &lt;span style="color:#ae81ff"&gt;25883&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;From&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filenode &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; Name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;25883&lt;/span&gt; table_partition_default&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Finally located: the table &lt;code&gt;table_partition_default&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; table_partition_default
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt;: table_partition_default &lt;span style="color:#66d9ef"&gt;DEFAULT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;: (&lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; ((date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2022-05-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2022-05-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (da
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;dt&lt;span style="color:#f92672"&gt;+&lt;/span&gt; table_partition_default
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; List &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; relations
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Owner&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Persistence &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Size&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+------------------------------------+-------+------------+-------------+-------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; table_partition_default &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; user1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; permanent &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; GB &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It was the default partition table, with tens of GB of data. Oracle DBAs might find this unfamiliar — PG&amp;rsquo;s default partition receives data that doesn&amp;rsquo;t fall into any defined partition range. The default partition ensures data is still accepted even if no matching range is defined.&lt;/p&gt;
&lt;p&gt;If data exists in the default partition and a new partition needs to cover that range, what happens? It directly throws an error:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;exists&lt;/span&gt; table_partition_pxxxx partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; table_partition &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-12 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-13 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;23514&lt;/span&gt;: updated partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;default&lt;/span&gt; partition &lt;span style="color:#e6db74"&gt;&amp;#34;table_partition_default&amp;#34;&lt;/span&gt; would be violated &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;some&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SCHEMA&lt;/span&gt; NAME: &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; NAME: table_partition_default
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: check_default_partition_contents, partbounds.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3227&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;As you can see, when adding a child partition, the default partition&amp;rsquo;s partition constraint is automatically modified. The default partition constraint check is essentially validating the default partition&amp;rsquo;s data against the new partition&amp;rsquo;s range.&lt;/p&gt;
&lt;p&gt;At this point, the cause is clear:&lt;/p&gt;
&lt;p&gt;When adding a new child partition to a partitioned table, the partition creation statement needs to validate data in the default partition to ensure the new partition&amp;rsquo;s data range doesn&amp;rsquo;t conflict with existing default partition data. This caused &lt;code&gt;CREATE TABLE PARTITION OF&lt;/code&gt; to read a massive amount of default partition data, preventing the new partition from being created. The blocking then cascaded, making business data unqueryable and unwritable.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Summary and Recommendations
 &lt;div id="summary-and-recommendations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary-and-recommendations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;PostgreSQL partitioned tables are becoming increasingly common. Maintaining partitions requires attention to many details. I recommend reading &lt;a href="https://blog.csdn.net/qq_40687433/article/details/132525655" target="_blank" rel="noreferrer"&gt;PostgreSQL Partitioned Tables&lt;/a&gt;, which covers almost everything.&lt;/p&gt;
&lt;p&gt;In this case, the key to resolution is the data in the default partition. Before refactoring the default partition, do not use &lt;code&gt;PARTITION OF&lt;/code&gt; to create child partitions.&lt;/p&gt;
&lt;p&gt;Default partition refactoring plan:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Detach the default child partition, then properly create child partitions, and reinsert the default table data back into the partitioned table.&lt;/li&gt;
&lt;li&gt;If necessary, after detaching and creating proper child partitions, create an empty default partition to maintain business data continuity.&lt;/li&gt;
&lt;li&gt;Note that detach differs from attach — detach requires a level 8 lock on the parent table. PG14 supports &lt;code&gt;DETACH CONCURRENTLY&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you don&amp;rsquo;t refactor the default partition, check the current data range in the default partition. Using &lt;code&gt;ATTACH&lt;/code&gt; to add child partitions will be slow, but won&amp;rsquo;t block reads and writes.&lt;/p&gt;
&lt;p&gt;Finally, a review of best practices for adding partitions:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;PARTITION OF&lt;/code&gt; requires a level 8 lock on the parent table, which carries risk. The recommended approach is to use &lt;code&gt;ATTACH&lt;/code&gt; to add new child partitions (partition indexes can be handled similarly). This does not block reads and writes, has no business impact, and can be done online.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The correct approach for adding new partitions&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; lzlpartition1_202303
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#66d9ef"&gt;LIKE&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULTS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINTS&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION1 attach partition LZLPARTITION1_202303 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If the new partition already has data, ATTACH may still be slow. You can optimize by pre-creating constraints:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The correct approach for adding a partition that already has data&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Reduce verbose DDL by using LIKE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; lzlpartition1_202303
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#66d9ef"&gt;LIKE&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULTS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINTS&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Skip this step if no data exists. Add a CHECK constraint referencing other partitions&amp;#39; Partition constraint to reduce ATTACH constraint validation time.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202303 &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; chk_202303 &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; ((date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Add partition via ATTACH
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION1 attach partition LZLPARTITION1_202303 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Optional. Before transactions occur on the new partition, drop the extra CHECK constraint
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202303 &lt;span style="color:#66d9ef"&gt;drop&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; chk_202303;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;</content:encoded></item></channel></rss>