<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>PostgreSQL面试题 on Last DBA</title><link>https://lastdba.com/en/categories/postgresql%E9%9D%A2%E8%AF%95%E9%A2%98/</link><description>Recent content in PostgreSQL面试题 on Last DBA</description><generator>Hugo -- gohugo.io</generator><language>en-US</language><copyright>© 2026 liuzhilong62</copyright><lastBuildDate>Mon, 12 Aug 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://lastdba.com/en/categories/postgresql%E9%9D%A2%E8%AF%95%E9%A2%98/index.xml" rel="self" type="application/rss+xml"/><item><title>PostgreSQL Interview Questions - Comprehensive Collection</title><link>https://lastdba.com/en/2024/08/12/postgresql-interview-questions-comprehensive-collection/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/postgresql-interview-questions-comprehensive-collection/</guid><description>&lt;p&gt;Interview questions source: PostgreSQL Apprentice &lt;a href="https://mp.weixin.qq.com/s/DCmO1E31JAbec1M05y2_UQ" target="_blank" rel="noreferrer"&gt;PostgreSQL Interview Questions Collection&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Existing answers: Hehuyi_In &lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/128885660" target="_blank" rel="noreferrer"&gt;Learning and Answering PostgreSQL Interview Questions&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;1. MVCC Implementation and Differences from Oracle
 &lt;div id="1-mvcc-implementation-and-differences-from-oracle" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#1-mvcc-implementation-and-differences-from-oracle" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;ORACLE and MYSQL both use UNDO to implement multi-version concurrency control. Undo entries are recorded in &lt;strong&gt;additional&lt;/strong&gt; undo tablespaces. If the UNDO segment is insufficient, an ora-01555 error occurs.



&lt;img src="https://lastdba.com/img/csdn/fec3e1c0263f.png" alt="Insert image description here" /&gt;
&lt;a href="https://www.slideshare.net/AmitBhalla2/less10-undo-15946188" target="_blank" rel="noreferrer"&gt;https://www.slideshare.net/AmitBhalla2/less10-undo-15946188&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL has no undo mechanism. To ensure transaction rollback, old tuples remain on the table. For example, an update inserts a new row while the old data stays in place. Tuple headers, clog, etc. determine which tuple version is valid. Visibility information in tuple headers includes xmin, xmax, cmin, cmax, infomask, and infomask2, stored in the tuple header.&lt;/p&gt;</description><content:encoded>&lt;p&gt;Interview questions source: PostgreSQL Apprentice &lt;a href="https://mp.weixin.qq.com/s/DCmO1E31JAbec1M05y2_UQ" target="_blank" rel="noreferrer"&gt;PostgreSQL Interview Questions Collection&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Existing answers: Hehuyi_In &lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/128885660" target="_blank" rel="noreferrer"&gt;Learning and Answering PostgreSQL Interview Questions&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;1. MVCC Implementation and Differences from Oracle
 &lt;div id="1-mvcc-implementation-and-differences-from-oracle" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#1-mvcc-implementation-and-differences-from-oracle" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;ORACLE and MYSQL both use UNDO to implement multi-version concurrency control. Undo entries are recorded in &lt;strong&gt;additional&lt;/strong&gt; undo tablespaces. If the UNDO segment is insufficient, an ora-01555 error occurs.



&lt;img src="https://lastdba.com/img/csdn/fec3e1c0263f.png" alt="Insert image description here" /&gt;
&lt;a href="https://www.slideshare.net/AmitBhalla2/less10-undo-15946188" target="_blank" rel="noreferrer"&gt;https://www.slideshare.net/AmitBhalla2/less10-undo-15946188&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL has no undo mechanism. To ensure transaction rollback, old tuples remain on the table. For example, an update inserts a new row while the old data stays in place. Tuple headers, clog, etc. determine which tuple version is valid. Visibility information in tuple headers includes xmin, xmax, cmin, cmax, infomask, and infomask2, stored in the tuple header.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f34dabdc091c.png" alt="Insert image description here" /&gt;
&lt;a href="https://www.interdb.jp/pg/pgsql05/03.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql05/03.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Pros/cons: The undo approach requires extra undo space; space management is simpler. However, large transaction rollback is very troublesome since undo segments must be rolled back. The new-tuple approach makes large transaction rollback very fast, but this method creates dead tuples, requiring a vacuum mechanism to clean them. Vacuum freeze itself isn&amp;rsquo;t directly related to dead tuple cleanup (though both are vacuum processes); freeze prevents transaction ID wraparound.&lt;/p&gt;

&lt;h3 class="relative group"&gt;2. Why Table Bloat Occurs and Its Hazards
 &lt;div id="2-why-table-bloat-occurs-and-its-hazards" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#2-why-table-bloat-occurs-and-its-hazards" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Why table bloat?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As above, due to PostgreSQL&amp;rsquo;s unique MVCC mechanism, delete doesn&amp;rsquo;t truly remove tuples, and update equals delete+insert. Old tuples cannot be removed by DML statements, so space only &amp;ldquo;grows&amp;rdquo; without &amp;ldquo;cleaning&amp;rdquo; — this is table bloat. Vacuum is generally needed to clean dead tuples and mark space as available; or vacuum full rewrites the table for compaction.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hazards of table bloat:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Excessive table space usage&lt;/li&gt;
&lt;li&gt;SQL performance degradation&lt;/li&gt;
&lt;li&gt;Large tables cause longer vacuum cleanup times; vacuum full blocking time also increases, though pg_repack can replace vacuum full to reduce blocking&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Handling table bloat:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Manual vacuum&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Does not block queries or DML operations&lt;/li&gt;
&lt;li&gt;Does not immediately reclaim space, only marks it as available&lt;/li&gt;
&lt;li&gt;If the last page of a table has no tuples, that page gets truncated&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4bcffb429099.png" alt="Insert image description here" /&gt;
(&lt;a href="https://www.interdb.jp/pg/pgsql06.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql06.html&lt;/a&gt;)&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Autovacuum&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Autovacuum automatically invokes vacuum for concurrent cleanup as needed&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Manual vacuum full&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;8-level lock, blocks everything&lt;/li&gt;
&lt;li&gt;Table is completely rewritten; corresponding OS files are cleaned and rebuilt&lt;/li&gt;
&lt;li&gt;Rebuilds indexes, FSM (free space map), VM (visibility map)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5c9458f68c2e.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;pg_repack and other manual table rebuilds&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;pg_repack only has a brief lock during the final table switch&lt;/li&gt;
&lt;li&gt;Other tools with data sync and switch capabilities&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Avoiding table bloat:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Generally, autovacuum handles table bloat, but cleanup may not proceed smoothly in some scenarios:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Autovacuum worker isn&amp;rsquo;t running&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Both &lt;code&gt;autovacuum&lt;/code&gt; and &lt;code&gt;track_counts&lt;/code&gt; must be enabled for autovacuum to work&lt;/li&gt;
&lt;li&gt;&lt;code&gt;autovacuum_max_workers&lt;/code&gt; must be set high enough; multiple workers may be needed simultaneously&lt;/li&gt;
&lt;li&gt;Table hasn&amp;rsquo;t reached vacuum threshold — rows deleted/updated: threshold = &lt;code&gt;autovacuum_vacuum_threshold&lt;/code&gt; + &lt;code&gt;autovacuum_vacuum_scale_factor&lt;/code&gt; * tuples&lt;/li&gt;
&lt;li&gt;&lt;code&gt;autovacuum_vacuum_insert_threshold&lt;/code&gt; and &lt;code&gt;autovacuum_vacuum_insert_scale_factor&lt;/code&gt; represent insert thresholds (same algorithm). Insert-triggered vacuum thresholds theoretically have little to do with bloat cleanup since inserts don&amp;rsquo;t generate dead tuples. However, to prevent wraparound issues from not being handled in time, pg13 added this parameter (reference: &lt;a href="https://www.cybertec-postgresql.com/en/postgresql-autovacuum-insert-only-tables/" target="_blank" rel="noreferrer"&gt;postgresql-autovacuum-insert-only-tables&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;autovacuum_naptime&lt;/code&gt; is the autovacuum launcher cycle. If set too large, &lt;code&gt;autovacuum_max_workers&lt;/code&gt; may be sufficient and tables may meet thresholds, but the launcher hasn&amp;rsquo;t woken workers&lt;/li&gt;
&lt;li&gt;&lt;code&gt;vacuum_defer_cleanup_age&lt;/code&gt; delays vacuum cleanup by N transactions (originally designed to alleviate standby query conflicts; since &lt;code&gt;hot_standby_feedback&lt;/code&gt; and replication slots exist, pg16 removed this parameter)&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Disable or adjust cost-based vacuuming to make autovacuum faster&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Cost-based vacuuming may be enabled to reduce vacuum&amp;rsquo;s IO impact. When vacuum/autovacuum reaches the cost limit, it sleeps for &lt;code&gt;autovacuum_vacuum_cost_delay&lt;/code&gt; (or &lt;code&gt;vacuum_cost_delay&lt;/code&gt;) milliseconds. &lt;code&gt;vacuum_cost_delay&lt;/code&gt; defaults to 0 (disabling cost-based vacuuming); &lt;code&gt;autovacuum_vacuum_cost_delay&lt;/code&gt; at -1 means using the &lt;code&gt;vacuum_cost_delay&lt;/code&gt; setting. Disable delay or reduce the delay value&lt;/li&gt;
&lt;li&gt;If cost-based vacuuming is enabled, reasonably increase &lt;code&gt;vacuum_cost_limit&lt;/code&gt; trigger threshold and reduce the &lt;code&gt;vacuum_cost_page_dirty&lt;/code&gt;, &lt;code&gt;vacuum_cost_page_miss&lt;/code&gt;, &lt;code&gt;vacuum_cost_page_hit&lt;/code&gt; values that count toward the limit&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Active transactions preventing vacuum&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Business long transactions not finished. Application-side transactions shouldn&amp;rsquo;t run too long; database-side can kill sessions: 1) manual kill 2) set &lt;code&gt;idle_in_transaction_session_timeout&lt;/code&gt; to limit idle time 3) set &lt;code&gt;old_snapshot_threshold&lt;/code&gt; to limit SQL execution (not recommended before PG14)&lt;/li&gt;
&lt;li&gt;Unclosed cursors&lt;/li&gt;
&lt;li&gt;&lt;code&gt;hot_standby_feedback&lt;/code&gt; enabled: primary records catalog_xmin, standby long queries prevent primary cleanup&lt;/li&gt;
&lt;li&gt;Remove unused replication slots&lt;/li&gt;
&lt;li&gt;Orphan transactions. Prepared transactions are explicit 2PC transactions inside PG. If a prepared transaction is opened but not completed, and prepared transactions are unrelated to sessions, orphan transactions block indefinitely&lt;/li&gt;
&lt;li&gt;pg_dump logical backup opens implicit repeatable read isolation level; transaction not finished&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Performance aspects&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;maintenance_work_mem&lt;/code&gt; is memory for maintenance operations like vacuum; default 64MB can be increased. Or use &lt;code&gt;autovacuum_work_mem&lt;/code&gt; separately for autovacuum workers; default -1 means using &lt;code&gt;maintenance_work_mem&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Large table vacuum is especially slow; since vacuum can&amp;rsquo;t parallelize on the same table, convert large tables to partitioned tables so vacuum can run in parallel across partitions&lt;/li&gt;
&lt;li&gt;Good IO system&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Adjust per-table autovacuum parameters&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Global autovacuum settings may not suit certain business tables; adjust per-table autovacuum parameters to increase vacuum trigger probability&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="6"&gt;
&lt;li&gt;Manual vacuum&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Autovacuum is generally unpredictable; for special business tables, manual vacuum&lt;/li&gt;
&lt;li&gt;Run manual vacuum during low-traffic periods, optionally with freeze and analyze&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The above handles 99.99% of table bloat problems. One type of bloat is harder to address: &lt;strong&gt;with cost-based vacuuming disabled, autovacuum dead tuple cleanup speed cannot keep up with generation speed&lt;/strong&gt;. Essentially, too many concurrent update (or insert+delete) transactions mean this round of vacuum hasn&amp;rsquo;t finished cleaning available space before massive updates generate new space and dead tuples, causing continuous bloat. Solutions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Convert to partitioned tables for vacuum parallelism (only meaningful if updates are distributed across partitions)&lt;/li&gt;
&lt;li&gt;Run vacuum full or pg_repack during off-peak hours to thoroughly clean table holes&lt;/li&gt;
&lt;li&gt;24/7 high-concurrency tables are unlikely; if they exist, restructure to multi-table writes or move to caching systems like Redis&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s?__biz=MzUyOTAyMzMyNg==&amp;amp;mid=2247485791&amp;amp;idx=1&amp;amp;sn=24ef88bd19d923d60fdf1a8969577fb0&amp;amp;chksm=fa66216ecd11a8789ee0a9a4b7e850d98086bf3ae542aad814788bd8f262dd675be92fa7c5db&amp;amp;mpshare=1&amp;amp;scene=1&amp;amp;srcid=0204hKVlhPOQa19uoxv7u3Ch&amp;amp;sharer_sharetime=1675514289096&amp;amp;sharer_shareid=1a32625a0cee9a1f3987aa62eea3fa03&amp;amp;exportkey=n_ChQIAhIQQ4H0Z5qjGf21zcqa8OvAKxKZAgIE97dBBAEAAAAAAMSqIP5cR5AAAAAOpnltbLcz9gKNyK89dVj0uMOj41SOhYI%2BA5Y3sbSQytf8OotyHqqED8OFC4Tealz7gt91%2FbaCaExVHDNExUGj%2FFrrrwQo6a3qGtJdUptL6vyG2pb9G0NKzNyuv1JbQq%2FLbX9LgTeCARhtml2oCiD%2FLpZJmHpbgRccjrjZCVmQ6oCACKTTSh1P2mfSJbPk7MwCYzdshC3CxYaXemFbwoL9u9tM2H36%2FYBpOLW4wJiSI54CgHscZ%2FeSZfNwaHsn99iojWcG11b204NEjkMmpFgKOq%2F%2FJDMJu0ZwZaQRaLfoLZ5H%2FOgmJOeUQMrp%2Bc7A7UROn7%2BWTGJct6i3l9jJd44OTjyu&amp;amp;acctmode=0&amp;amp;pass_ticket=UiziakVvQcg3ztgfB%2Bovewae4j0ijakENPH%2BRT8lyhXyARWs5hjeT%2FDsPN2ithp8%2B5Wqbk2ySDdewyfjSC2BMg%3D%3D&amp;amp;wx_header=0#rd" target="_blank" rel="noreferrer"&gt;Unveiling the Mystery of Table Bloat&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql06.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql06.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/16/routine-vacuuming.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/16/routine-vacuuming.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/16/runtime-config-autovacuum.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/16/runtime-config-autovacuum.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/16/runtime-config-resource.html#GUC-VACUUM-COST-DELAY" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/16/runtime-config-resource.html#GUC-VACUUM-COST-DELAY&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;3. Long Transaction Hazards and How to Trace Them
 &lt;div id="3-long-transaction-hazards-and-how-to-trace-them" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#3-long-transaction-hazards-and-how-to-trace-them" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Regular queries don&amp;rsquo;t generate transaction IDs but virtual transaction IDs (vxid). Virtual transaction IDs consist of backendID and a backend-local counter, unrelated to transaction ID (XID). However, although queries don&amp;rsquo;t generate transaction IDs, they hold snapshots for visibility checks. Snapshots contain tuple xmin and other information.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9b24ddaad8e7.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://www.interdb.jp/pg/pgsql05/05.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql05/05.html&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;So long transaction issues involve both DML and query statements, though their lock types differ.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Long transaction hazards:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Blocks vacuum cleanup, causing table bloat, excessive space usage, and SQL performance degradation&lt;/li&gt;
&lt;li&gt;Blocks other lock requests; e.g., DDL must check for long transactions before execution, otherwise long waits for higher-level locks cause lock escalation&lt;/li&gt;
&lt;li&gt;Long transactions cause create index concurrently to fail, leaving invalid indexes&lt;/li&gt;
&lt;li&gt;Occupies connection pool (though mainly a long-connection issue)&lt;/li&gt;
&lt;li&gt;Logical decoding data spilling to disk causing replication lag, also related to large transactions&lt;/li&gt;
&lt;li&gt;A long transaction with a savepoint subtransaction can cause query performance cliffs (reference: &lt;a href="https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/" target="_blank" rel="noreferrer"&gt;Why we spent the last month eliminating PostgreSQL subtransactions&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;How to trace long transactions:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pg_stat_activity: check xact_start for transaction start time, state_change for whether transaction is still running&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;4. Subtransaction Hazards and Considerations
 &lt;div id="4-subtransaction-hazards-and-considerations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#4-subtransaction-hazards-and-considerations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Subtransaction hazards:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Excessive transaction ID consumption, premature wraparound handling. Each subtransaction consumes one XID&lt;/li&gt;
&lt;li&gt;PGPROC_MAX_CACHED_SUBXIDS overflow causing performance degradation. Each backend has a subtransaction cache of &lt;code&gt;PGPROC_MAX_CACHED_SUBXIDS&lt;/code&gt;, fixed at 64 subtransactions (hardcoded). Exceeding 64 subtransactions spills to the &lt;code&gt;pg_subtrans&lt;/code&gt; directory (reference: &lt;a href="https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful" target="_blank" rel="noreferrer"&gt;PostgreSQL Subtransactions Considered Harmful&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Using subtransactions with FOR UPDATE explicit row locks causes dramatic database performance degradation (reference: &lt;a href="https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/" target="_blank" rel="noreferrer"&gt;Notes on some PostgreSQL implementation details&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;A long transaction with a savepoint subtransaction can also cause query performance cliffs (reference: &lt;a href="https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/" target="_blank" rel="noreferrer"&gt;Why we spent the last month eliminating PostgreSQL subtransactions&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Usage recommendations:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Subtransaction usage is discouraged given the above hazards&lt;/li&gt;
&lt;li&gt;If standby query workloads exist, prohibit subtransactions&lt;/li&gt;
&lt;li&gt;If subtransactions are still needed, keep them under 64 (preferably much lower)&lt;/li&gt;
&lt;li&gt;Besides explicit savepoints, subtransactions can also arise from exceptions, frameworks, and tools&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://liuzhilong.blog.csdn.net/article/details/130783474" target="_blank" rel="noreferrer"&gt;pg事务：子事务&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;5. Which Schema Changes Are Non-Online
 &lt;div id="5-which-schema-changes-are-non-online" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#5-which-schema-changes-are-non-online" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;All schema changes are non-online because all ALTER TABLE operations require an 8-level lock. However, some schema changes themselves take a long time or cause slow queries afterward. So this question can be reframed as three sub-questions:&lt;/p&gt;
&lt;p&gt;Impact on indexes? Impact on statistics? Does it require rewriting the table, causing long-held 8-level locks?&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7b272ed64104.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/cg1tXiifC83p0hWMs92Cxw" target="_blank" rel="noreferrer"&gt;Schema Change Summary Chart&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Summary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dropping a column completes immediately, but watch for composite index and multi-column statistics invalidation to avoid SQL performance avalanches&lt;/li&gt;
&lt;li&gt;Adding a column with a default value: 1) Pre-pg10 requires table rewrite 2) pg11+: only volatile function defaults require table rewrite. Also, statistics won&amp;rsquo;t be immediately available for the new column&lt;/li&gt;
&lt;li&gt;Changing column length: enlarging (except int to bigint) doesn&amp;rsquo;t rewrite the table; shrinking requires table rewrite; column statistics invalidated&lt;/li&gt;
&lt;li&gt;Changing column type: &lt;em&gt;table rewrite&lt;/em&gt;; statistics invalidated&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Adding constraints to existing columns scans the table, watch for scan duration&lt;/em&gt; (e.g., &lt;code&gt;ADD CONSTRAINT&lt;/code&gt;, &lt;code&gt;SET NOT NULL&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Adding defaults to existing columns completes immediately&lt;/em&gt; (e.g., &lt;code&gt;SET/DROP DEFAULT&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;SET { LOGGED | UNLOGGED } rewrites the table&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Storage parameter changes depend on what&amp;rsquo;s changing. E.g., fillfactor and autovacuum parameters are online, non-8-level-lock, immediate (reference: &lt;a href="https://www.postgresql.org/docs/16/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS" target="_blank" rel="noreferrer"&gt;Storage Parameters&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;6. Physical Backup Considerations (pg_start_backup)
 &lt;div id="6-physical-backup-considerations-pg_start_backup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#6-physical-backup-considerations-pg_start_backup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a226f3f1899f.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://postgrespro.com/media/2022/03/24/pgpro-backup-methods%20%281%29.pdf" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/media/2022/03/24/pgpro-backup-methods%20(1).pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PG physical backup:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Block-level backup, generally doesn&amp;rsquo;t support per-database backup (except pg_probackup)&lt;/li&gt;
&lt;li&gt;Exclusive mode is unnecessary because: 1) only works on primary 2) doesn&amp;rsquo;t allow parallel backup 3) created backup label may prevent primary instance recovery 4) functionally identical to non-exclusive backup. PG9.6 added non-exclusive mode; PG15 removed exclusive mode&lt;/li&gt;
&lt;li&gt;If explicitly using pg_start_backup(), must explicitly use pg_stop_backup() to end backup mode (function names differ slightly in PG15+)&lt;/li&gt;
&lt;li&gt;FPI (full page image) is force-enabled during backup, even if full_page_writes is off&lt;/li&gt;
&lt;li&gt;All tools (maybe) call pg_stop_backup() before backup starts for a checkpoint to flush dirty data, and back up all WAL from start to end, even newly generated WAL during backup, ensuring data consistency and PITR&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;pg_basebackup:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Native, built-in&lt;/li&gt;
&lt;li&gt;Wraps pg_start_backup and pg_stop_backup commands&lt;/li&gt;
&lt;li&gt;PG17+ supports incremental backup and backup set merging&lt;/li&gt;
&lt;li&gt;Consumes one walsender process&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;pg_probackup:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Very powerful: supports incremental backup, incremental restore, parallelism, backup set merging, backup verification, remote backup, per-database restore, etc.&lt;/li&gt;
&lt;li&gt;BUG: address space cannot exceed 4GB, fixable by modifying source code&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;pgBackRest:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Also very powerful&lt;/li&gt;
&lt;li&gt;Prerequisite: SSH must be configured from backup server to database host&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://developer.aliyun.com/article/59359" target="_blank" rel="noreferrer"&gt;https://developer.aliyun.com/article/59359&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/app-pgbasebackup.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/app-pgbasebackup.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.enterprisedb.com/blog/exclusive-backup-mode-finally-removed-postgres-15" target="_blank" rel="noreferrer"&gt;https://www.enterprisedb.com/blog/exclusive-backup-mode-finally-removed-postgres-15&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/MasaoFujii/pg_exclusive_backup" target="_blank" rel="noreferrer"&gt;https://github.com/MasaoFujii/pg_exclusive_backup&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/postgrespro/pg_probackup" target="_blank" rel="noreferrer"&gt;https://github.com/postgrespro/pg_probackup&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://pgbackrest.org/user-guide.html" target="_blank" rel="noreferrer"&gt;https://pgbackrest.org/user-guide.html&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;7. How Logical Backup Ensures Consistency
 &lt;div id="7-how-logical-backup-ensures-consistency" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#7-how-logical-backup-ensures-consistency" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;pg_dump completes a full backup within a single transaction, with isolation level serializable or repeatable read&lt;/li&gt;
&lt;li&gt;Before backing up data, pg_dump acquires ACCESS SHARE locks on target objects to prevent table drops&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Additional logical backup considerations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Watch for lock conflicts during export&lt;/li&gt;
&lt;li&gt;If DDL operations are needed, avoid full-database or long-duration backups; split the backup into multiple tasks, e.g., one table per pg_dump invocation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://developer.aliyun.com/article/14582" target="_blank" rel="noreferrer"&gt;https://developer.aliyun.com/article/14582&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;8. Causes of WAL Accumulation
 &lt;div id="8-causes-of-wal-accumulation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#8-causes-of-wal-accumulation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Invalid replication slots&lt;/li&gt;
&lt;li&gt;Logical replication with long transactions&lt;/li&gt;
&lt;li&gt;Excessively large wal_keep_size&lt;/li&gt;
&lt;li&gt;Excessively small archive_timeout, forcing WAL switches and archiving (equivalent to pg_switch_xlog() + archiving)&lt;/li&gt;
&lt;li&gt;Archive failures generating .ready files&lt;/li&gt;
&lt;li&gt;Single-process archiving can&amp;rsquo;t keep up&lt;/li&gt;
&lt;li&gt;FPI full page writes (check for overly frequent checkpoints, UUID-like scattered write patterns)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;9. Hazards of Long Connections
 &lt;div id="9-hazards-of-long-connections" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#9-hazards-of-long-connections" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;When PG acquires snapshot data, it must scan all backend process transaction states. Too many connections degrade performance (recommended max ~1000; pg14 optimized but still not recommended to exceed)&lt;/li&gt;
&lt;li&gt;relcache/syscache doesn&amp;rsquo;t release cached metadata, and each process caches independently, causing high memory consumption&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;10. Role of Infomask Flags
 &lt;div id="10-role-of-infomask-flags" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#10-role-of-infomask-flags" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Infomask provides transaction, lock, and tuple status information, such as whether a transaction is committed/aborted, row lock info, HOT info, column count, etc.&lt;/li&gt;
&lt;li&gt;The header has two infomasks: &lt;code&gt;infomask&lt;/code&gt; and &lt;code&gt;infomask2&lt;/code&gt;. They store different information, with different bits representing different meanings&lt;/li&gt;
&lt;li&gt;Hint bits also write transaction info to infomask, so visibility can be determined from tuple headers alone without accessing clog&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://liuzhilong.blog.csdn.net/article/details/130782857?spm=1001.2014.3001.5502" target="_blank" rel="noreferrer"&gt;pg事务：事务相关元组结构&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;11. How NULL Values Are Stored and Whether Indexes Store NULLs
 &lt;div id="11-how-null-values-are-stored-and-whether-indexes-store-nulls" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#11-how-null-values-are-stored-and-whether-indexes-store-nulls" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;How NULL values are stored:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f3fc29d1f5cd.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;NULL is stored in the tuple header, not the tuple data area&lt;/li&gt;
&lt;li&gt;One bit in infomask marks whether the tuple contains NULLs&lt;/li&gt;
&lt;li&gt;t_bits has n*8 bits (n integer; e.g., a 10-column table has 16-bit t_bits), with a bitmap representing which columns are NULL&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Whether indexes store NULL values:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PostgreSQL indexes store NULL values; Oracle indexes don&amp;rsquo;t&lt;/li&gt;
&lt;li&gt;Storage position depends on (NULLS FIRST or NULLS LAST)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://www.highgo.ca/2020/10/20/the-way-to-store-null-value-in-pg-record/" target="_blank" rel="noreferrer"&gt;https://www.highgo.ca/2020/10/20/the-way-to-store-null-value-in-pg-record/&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;12. Why Full Page Writes Are Needed
 &lt;div id="12-why-full-page-writes-are-needed" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#12-why-full-page-writes-are-needed" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The official documentation&amp;rsquo;s introduction to full page writes is fairly general:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;This is needed because a page write that is in process during an operating system crash might be only partially completed, leading to an on-disk page that contains a mix of old and new data. The row-level change data normally stored in WAL will not be enough to completely restore such a page during post-crash recovery. Storing the full page image guarantees that the page can be correctly restored, but at the price of increasing the amount of data that must be written to WAL. (Because WAL replay always starts from a checkpoint, it is sufficient to do this during the first change of each page after a checkpoint)&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;OS file pages are typically 4KB, while PG pages are typically 8KB. Partial writes can occur, where a disk data page contains both old and new data, causing data loss during recovery. Hence the need for full page writes.&lt;/p&gt;
&lt;p&gt;Partial writes are closely related to disk characteristics. Detailed answers are difficult; reference &lt;a href="http://www.killdb.com/2020/04/05/double_write_partial_write_oracle_mysql_postgresql/" target="_blank" rel="noreferrer"&gt;roger&amp;rsquo;s article&lt;/a&gt;. Summary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Partial writes relate to whether the disk supports atomic writes&lt;/li&gt;
&lt;li&gt;Partial writes relate to whether OS block size matches database block size. Oracle/PG blocks default to 8KB, MySQL to 16KB, OS to 4KB. A database&amp;rsquo;s minimum IO requires multiple OS calls&lt;/li&gt;
&lt;li&gt;For PG, if a &lt;strong&gt;data page&lt;/strong&gt; experiences partial write, it can recover using full page images in WAL&lt;/li&gt;
&lt;li&gt;For MySQL, there&amp;rsquo;s a double write mechanism. The double write buffer is on-disk space, written sequentially before data pages to mitigate partial write&lt;/li&gt;
&lt;li&gt;For Oracle, much work has been done but no obvious solution exists. However, Oracle supports block-level recovery to replace corrupted data blocks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Different DBs adopt different approaches to reduce partial writes. PG writes the entire data page to WAL logs, but this causes WAL write amplification. This can be mitigated through various means.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How to perfectly solve the partial write problem?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Atomic write-capable devices&lt;/li&gt;
&lt;li&gt;OS minimum IO matching database minimum IO&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="http://www.killdb.com/2020/04/05/double_write_partial_write_oracle_mysql_postgresql/" target="_blank" rel="noreferrer"&gt;http://www.killdb.com/2020/04/05/double_write_partial_write_oracle_mysql_postgresql/&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;13. Various Causes of Index Invalidation
 &lt;div id="13-various-causes-of-index-invalidation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#13-various-causes-of-index-invalidation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Index invalidation:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CREATE INDEX CONCURRENTLY can leave an invalid index due to deadlock or unique index check failure; invalid indexes still get updated&lt;/li&gt;
&lt;li&gt;Invalid indexes on partitioned parent tables indicate some partitions have the index while others don&amp;rsquo;t&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Index not being used:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Inaccurate statistics&lt;/li&gt;
&lt;li&gt;Selectivity&lt;/li&gt;
&lt;li&gt;Data skew&lt;/li&gt;
&lt;li&gt;Soft parsing: first 5 times cached different execution plans&lt;/li&gt;
&lt;li&gt;Leftmost prefix principle&lt;/li&gt;
&lt;li&gt;Insufficient data (hash or full scan not slower than index)&lt;/li&gt;
&lt;li&gt;Functions (unless a matching immutable function index exists), implicit conversions, operations, LIKE with leading &amp;lsquo;%&amp;rsquo;&amp;hellip;&lt;/li&gt;
&lt;li&gt;Data type mismatch&lt;/li&gt;
&lt;li&gt;Collation mismatch (less of an issue in PG since database collation can&amp;rsquo;t change after creation; data within one database shares the same collation; cross-database access is normally impossible)&lt;/li&gt;
&lt;li&gt;SQL collation sort differing from index collation sort&lt;/li&gt;
&lt;li&gt;LIKE only usable with collation C or pattern index&lt;/li&gt;
&lt;li&gt;High correlation: index logical order vs data physical order correlation; accessing scattered data via index&lt;/li&gt;
&lt;li&gt;LIMIT xx ORDER BY column1, MIN/MAX needing TOP N scenarios where the optimizer chooses another index&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;14. Role of Commit Log
 &lt;div id="14-role-of-commit-log" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#14-role-of-commit-log" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Commit log records transaction status. During the next visibility check on a table, hint bits are triggered, writing clog transaction status to the tuple header.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why not write transaction status to the tuple header immediately?&lt;/strong&gt; Hint bits immediate update performs very poorly, so transaction status is first placed in clog, reducing PGXACT contention and improving performance.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/130782857" target="_blank" rel="noreferrer"&gt;pg事务：事务相关元组结构&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;15. Database Join Methods and Their Applicable Scenarios
 &lt;div id="15-database-join-methods-and-their-applicable-scenarios" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#15-database-join-methods-and-their-applicable-scenarios" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;1.1 Nested Loop Join&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/20abc423c1e9.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1,t3 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; lzl1.col1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;t3.a::text;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Nested Loop (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; Filter: ((lzl1.col1)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t3.a)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The driving table (outer in the diagram, first table in the plan) matches each row against every row of the driven table (inner, second table in the plan). The driving table is scanned once; the driven table is scanned N times (N = driving table rows).&lt;/p&gt;
&lt;p&gt;NL suits almost all scenarios; it&amp;rsquo;s the simplest brute-force join. Generally smaller tables serve as the driving table (actually neither table should be too large, unless other join types don&amp;rsquo;t apply).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1.2 Materialized Nested Loop Join&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2b45752abb3b.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;testdb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXPLAIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; tbl_a &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; a, tbl_b &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; a.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Nested Loop (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;750230&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; Filter: (a.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_a a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;145&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Materialize (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;98&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_b b (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;73&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If the driven table (inner) needs multiple scans, physical IO each time would be very slow (and seems silly). Materialize scans the driven table into memory (work_mem), performing only one physical table scan, allowing the driven table to be accessed multiple times in memory.&lt;/p&gt;
&lt;p&gt;This scenario is very common in real-world workloads.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1.3 Indexed Nested Loop Join (inner indexed)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/661dba35e09a.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;testdb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXPLAIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; tbl_c &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;, tbl_b &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Nested Loop (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1935&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_b b (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;73&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; tbl_c_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_c &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;1.4 NL Variants&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d21a425177c0.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;All are essentially NL; the main variations are whether indexes are used on either table and whether Materialize is applied.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2.1 Merge Join&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9914756afd16.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;testdb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXPLAIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; tbl_a &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; a, tbl_b &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; a.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; b.id &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Merge &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;944&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;71&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;984&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;71&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Merge Cond: (a.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;809&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;834&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: a.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_a a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;145&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;135&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;137&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;83&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: b.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_b b (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (id &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In merge join, both the driving and driven tables must be sorted first (both tables have Sort in the plan) before matching. Advantage: fewer table scans and matches than NL. Disadvantage: sorting required.&lt;/p&gt;
&lt;p&gt;Since indexes are sorted, and SQL may include DISTINCT, GROUP BY, SORT, MAX/MIN etc. requiring ordering, merge join is also common.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2.2 Materialized Merge Join&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/fb637c9d769e.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;testdb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXPLAIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; tbl_a &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; a, tbl_b &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; a.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Merge &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10466&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10578&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2064&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Merge Cond: (a.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6708&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6733&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1032&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: a.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_a a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1529&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1032&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Materialize (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3757&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3782&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1032&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3757&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3770&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1032&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: b.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_b b (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1193&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1032&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Materialize doesn&amp;rsquo;t reduce table scans (both tables scanned once), but the sort operation can happen in the backend&amp;rsquo;s work_mem for better efficiency; if exceeding work_mem, disk sort is used.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2.3 Merge Join Variants&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7cfae9b6cfff.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Similar to NL variants, mainly Materialize and index usage. When using indexes, since the index is inherently ordered, no extra sorting is needed:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Merge &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;135&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;61&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;322&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Merge Cond: (&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; tbl_c_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_c &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;318&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;135&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;137&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;83&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: b.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_b b (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (id &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So indexes and Materialize are very common in merge joins.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3.1 Hash Join&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/53c4660e122d.png" alt="Insert image description here" /&gt;


&lt;img src="https://lastdba.com/img/csdn/fb09a2a553f8.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Hash join consists of build and probe phases.&lt;/p&gt;
&lt;p&gt;The build phase places the driving table (inner in the diagram, second row in the plan!) into work_mem; the probe phase compares hash values.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hash join only possible with &amp;lsquo;=&amp;rsquo; conditions&lt;/li&gt;
&lt;li&gt;Hash join consumes memory; generally both tables aren&amp;rsquo;t very large&lt;/li&gt;
&lt;li&gt;Note: the driving table (hash build table) is the second row in the plan, opposite of NL&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;3.2 Hybrid Hash Join with Skew&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Not fully understood; appears to support spilling to disk. To be revisited.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql03/05/01.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql03/05/01.html&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;16. Applicable Scenarios for Various Index Types (HASH/GIN/BTREE/GIST/BLOOM/BRIN)
 &lt;div id="16-applicable-scenarios-for-various-index-types-hashginbtreegistbloombrin" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#16-applicable-scenarios-for-various-index-types-hashginbtreegistbloombrin" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;(1) BTREE&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f490c66d7714.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikibooks.org/wiki/PostgreSQL/Index_Btree" target="_blank" rel="noreferrer"&gt;https://en.wikibooks.org/wiki/PostgreSQL/Index_Btree&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Possible usage patterns:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;		 &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LIKE&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;foo%&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;A meta node points to the root node&lt;/li&gt;
&lt;li&gt;Leaf node access complexity O(logN), N being row count&lt;/li&gt;
&lt;li&gt;Inherently sorted, easily used by ORDER BY, MIN/MAX, GROUP BY, merge joins, etc.&lt;/li&gt;
&lt;li&gt;Default index type, most common. Structure is similar across databases with leaf node structure differences (MySQL secondary index leaf nodes store index key + primary key, then access clustered index via primary key; Oracle index leaf nodes store index key + rowid; PG index leaf nodes store index key + tid)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;(2) HASH&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a7e8e0b28860.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://leopard.in.ua/2015/04/13/postgresql-indexes）&lt;/p&gt;
&lt;p&gt;Index data is converted to 32-bit hash values stored in corresponding hash buckets; different hash values point to their respective data rows.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Complexity O(1)&lt;/li&gt;
&lt;li&gt;Hash indexes can &lt;strong&gt;only&lt;/strong&gt; be used for &lt;code&gt;=&lt;/code&gt; conditions&lt;/li&gt;
&lt;li&gt;When key values are large, they&amp;rsquo;re generally smaller than BTREE indexes and don&amp;rsquo;t need character-by-character comparison like BTREE, offering better efficiency. So hash indexes suit scenarios with large key values&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;(3) GIST&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;GIST (Generalized Search Tree) is similar to BTREE, also a balanced tree. GIST isn&amp;rsquo;t actually one index type but a framework containing many index strategies: R-TREE, RD-TREE. Unlike BTREE using &lt;code&gt;=&lt;/code&gt;, &lt;code&gt;&amp;gt;&lt;/code&gt; etc. for numeric/character data, GIST excels at geographic, text, image, and similar data. Geographic operators include: &lt;code&gt;&amp;lt;-&amp;gt;&lt;/code&gt; distance calculation, &lt;code&gt;&amp;lt;&amp;lt;&lt;/code&gt; left-of check, &lt;code&gt;@&amp;gt;&lt;/code&gt; contains check, etc.&lt;/p&gt;
&lt;p&gt;GIST excels at:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GIS data processing (similar data processing also possible, e.g., &lt;a href="https://pic.huodongjia.com/ganhuodocs/2017-07-15/1500104265.79.pdf" target="_blank" rel="noreferrer"&gt;digoal-GIST index for IP range query optimization&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Nearest-neighbor algorithms (pg_vector and similar vector data; to be researched)&lt;/li&gt;
&lt;li&gt;Full-text search (seems to need contrib/intarray)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;RTREE:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f63754993caa.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://en.wikipedia.org/wiki/R-tree）&lt;/p&gt;
&lt;p&gt;The most common index for GIS data is RTREE. Two-dimensional spatial data consists of coordinates; scanning coordinates one by one to find locations is slow. BTREE isn&amp;rsquo;t suitable for such data, so RTREE emerged. RTREE&amp;rsquo;s core concept is grouping nearby points using rectangles at different hierarchy levels; finer grouping yields more precise positioning.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/4175817" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/4175817&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;(4) SP-GIST:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Space-Partitioned GIST is similar to GIST, also an index creation framework. SP-GIST suits structures that partition space into non-overlapping regions (unlike RTREE which overlaps), such as quadtrees, k-d trees, and radix trees.&lt;/p&gt;
&lt;p&gt;Quadtrees:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2103e76b673a.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://en.wikipedia.org/wiki/Quadtree）&lt;/p&gt;
&lt;p&gt;Q-TREE comes in square, rectangular, and various shapes. The most &amp;ldquo;orthodox&amp;rdquo; Q-TREE as shown above generally has these properties:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each internal node has four children&lt;/li&gt;
&lt;li&gt;Index follows depth structure to locate data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;K-d trees:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/656f08fc9ac3.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/05d79891bd23.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://en.wikipedia.org/wiki/K-d_tree）&lt;/p&gt;
&lt;p&gt;K-dimensional trees manage multi-dimensional points using multi-dimensional space concepts; each non-leaf node is split in two. For example, the 3D space diagram above is a 3-dimensional k-d tree model: first split (red) divides the entire space in half; second split (green) divides subspaces in half&amp;hellip; until no further division is possible. The second diagram shows the tree structure of a 3D k-d tree (don&amp;rsquo;t mistake it for BTREE!); this tree has only 3 dimensions: Name, Age, Salary.&lt;/p&gt;
&lt;p&gt;Radix-tree:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/157f00ff6b48.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://en.wikipedia.org/wiki/Radix_tree）&lt;/p&gt;
&lt;p&gt;Radix: each child synthesizes its parent. Key lookup complexity is O(path length); if common prefixes exist, complexity is higher.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/4220639" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/4220639&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;(5) GIN&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;BTREE and GIST have very low query efficiency when there are very many key-value entries. GIN (Generalized Inverted Index) excels at such scenarios: array, full text, and JSON retrieval operations. Both GIST and GIN are generalized/framework-based, supporting multiple data index types; both also support full-text indexing. GIN only supports Bitmap scans.&lt;/p&gt;
&lt;p&gt;PostgreSQL natively supports many operators, some of which are GIN-related data type operators:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.postgresql.org/docs/16/functions-array.html" target="_blank" rel="noreferrer"&gt;Array operators&lt;/a&gt;, e.g., &lt;code&gt;@&amp;gt;&lt;/code&gt; whether array1 contains array2; &lt;code&gt;unnest&lt;/code&gt; expand array&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.postgresql.org/docs/16/functions-textsearch.html" target="_blank" rel="noreferrer"&gt;Full-text search operators&lt;/a&gt;, e.g., &lt;code&gt;@@&lt;/code&gt; whether tsvector matches tsquery&lt;/li&gt;
&lt;li&gt;Also some &lt;a href="https://www.postgresql.org/docs/16/functions-json.html" target="_blank" rel="noreferrer"&gt;JSON operators&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PG supports &lt;a href="https://www.postgresql.org/docs/16/datatype-textsearch.html" target="_blank" rel="noreferrer"&gt;two data types for full-text search&lt;/a&gt;: tsvector and tsquery&lt;/p&gt;
&lt;p&gt;&lt;em&gt;1. tsvector&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;tsvector tokenizes text with &lt;strong&gt;deduplication and sorting&lt;/strong&gt;, using tsvector_ops operators. Example tokenization:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;The Fat Rat is a Rat&amp;#39;&lt;/span&gt;::tsvector;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tsvector 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;Fat&amp;#39;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;Rat&amp;#39;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;The&amp;#39;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;is&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;::tsvector tokenization is generally not the final form; to_tsvector normalizes tokens (final form), showing token positions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; to_tsvector(&lt;span style="color:#e6db74"&gt;&amp;#39;english&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;The Fat Rat is a Rat&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; to_tsvector 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;fat&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;rat&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note &amp;rsquo;the&amp;rsquo;, &amp;lsquo;is&amp;rsquo;, &amp;lsquo;a&amp;rsquo;, and case are all removed — this is to_tsvector&amp;rsquo;s rule, matching real-world scenarios since full-text search typically targets words.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;2. tsquery&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;Normally you can search tokenized text by word:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; to_tsvector(&lt;span style="color:#e6db74"&gt;&amp;#39;The Fat Rat is a Rat&amp;#39;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;@@&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;rat&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;?&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;column&lt;/span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;To search for &amp;ldquo;contains both fat and rat&amp;rdquo;, simple word input won&amp;rsquo;t work — tsquery operates on &lt;em&gt;the tokens being searched&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;tsquery can be composed with &lt;code&gt;&amp;amp;&lt;/code&gt; (AND), &lt;code&gt;|&lt;/code&gt; (OR), &lt;code&gt;!&lt;/code&gt; (NOT), &lt;code&gt;&amp;lt;-&amp;gt;&lt;/code&gt; (FOLLOWED BY). Examples:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; to_tsvector(&lt;span style="color:#e6db74"&gt;&amp;#39;The Fat Rat is a Rat&amp;#39;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;@@&lt;/span&gt; to_tsquery( &lt;span style="color:#e6db74"&gt;&amp;#39;fat&amp;amp;rat&amp;#39;&lt;/span&gt; );
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;?&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;column&lt;/span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; to_tsvector(&lt;span style="color:#e6db74"&gt;&amp;#39;The Fat Rat is a Rat&amp;#39;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;@@&lt;/span&gt; to_tsquery( &lt;span style="color:#e6db74"&gt;&amp;#39;fat&amp;amp;rat&amp;amp;cat&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;?&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;column&lt;/span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; to_tsvector(&lt;span style="color:#e6db74"&gt;&amp;#39;The Fat Rat is a Rat&amp;#39;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;@@&lt;/span&gt; to_tsquery( &lt;span style="color:#e6db74"&gt;&amp;#39;rat&amp;lt;-&amp;gt;fat&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;?&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;column&lt;/span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; f&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Fulltext GIN:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Full-text GIN indexes first tokenize the indexed field (to_tsvector). Example: doc_tsv below is the tokenized state of &lt;code&gt;left&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; doc_tsv 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+----------------------+---------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Can a sheet slitter &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slitter&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; How many sheets coul &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;could&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;mani&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slitter&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; I slit a sheet, a sh &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Upon a slitted sheet &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;upon&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Whoever slit the she &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;good&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slitter&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;whoever&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; I am a sheet slitter &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slitter&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; I slit sheets. &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; I am the sleekest sh &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;ever&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sleekest&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slitter&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; She slits the sheet &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Then indexing by tokens and their ctids:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/815fee8ad284.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://postgrespro.com/blog/pgsql/4261647" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/4261647&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;The index is sorted by token order, similar to BTREE; leaf nodes store ctids pointed to by tokens. Since the same token can come from multiple tuples, a token can point to multiple ctids. When multiple ctids exist, a posting tree is built — essentially a BTREE of ctids within.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fulltext GIN addressing:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;for &amp;ldquo;mani&amp;rdquo; — (0,2).
for &amp;ldquo;slitter&amp;rdquo; — (0,1), (0,2), (1,2), (1,3), (2,2).&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/49ae172a7923.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GIN updates:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Updating (insert/update/delete) a text generally requires updating many places in the GIN index because:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;One text can have many tokens scattered across GIN index branches&lt;/li&gt;
&lt;li&gt;One token may contain multiple ctids since many texts share that token&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This makes GIN updates very expensive. Batch updates are typically better than row-by-row updates since some tokens are shared, reducing update work.&lt;/p&gt;
&lt;p&gt;Besides batch updates, GIN provides fast update functionality (fastupdate = true):&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/891a4e0ed575.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://www.pgcon.org/2016/schedule/attachments/434_Index-internals-PGCon2016.pdf）&lt;/p&gt;
&lt;p&gt;GIN fast update:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Incrementally updated data goes to a separate, unsorted area&lt;/li&gt;
&lt;li&gt;When vacuum runs or the list reaches &lt;code&gt;gin_pending_list_limit&lt;/code&gt;, incremental updates are written back to the main GIN index&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;GiST or GIN?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Both GiST and GIN are generalized index frameworks supporting full-text indexing, but their full-text index structures are completely different. GIST suits geographic and multi-dimensional spatial data; GIN mainly indexes scenarios where a key contains multiple values, such as arrays, full text, JSON.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GIN indexes are faster than GiST; generally, full-text indexing can blindly choose GIN (reference: &lt;a href="https://leopard.in.ua/2015/04/13/postgresql-indexes" target="_blank" rel="noreferrer"&gt;GIST vs GIN&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Only with very frequent updates should GiST be considered, assuming fast update strategy can&amp;rsquo;t solve the update problem (e.g., configuring nightly write-back strategy). Better to compare GiST and GIN for various full-text indexing scenarios.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/16/datatype-textsearch.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/16/datatype-textsearch.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/4261647" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/4261647&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;(6) BRIN&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0ee340aa3ea8.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://postgrespro.com/blog/pgsql/5967830）&lt;/p&gt;
&lt;p&gt;BRIN is not a tree-type index. Data is grouped in multiple pages (or blocks) as one range (similar to range partition but not physically partitioned). The table is divided into ranges, hence the name Block Range Index (BRIN).&lt;/p&gt;
&lt;p&gt;The most critical BRIN component is the revmap layer, which stores only key value ranges and ctids, &lt;strong&gt;not the key values themselves&lt;/strong&gt;. This is why BRIN indexes are very small — storing key values would make it like a branch-less BTREE.&lt;/p&gt;
&lt;p&gt;Since only key value ranges and ctids are stored, data lookup requires accessing all data pages pointed to by matching revmap pages, then rechecking for final data rows.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;															QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; flights_bi (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;75&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;151&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;192&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;210&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;587353&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: (airport_utc_offset &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;08:00:00&amp;#39;&lt;/span&gt;::interval)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; Removed &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;191318&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Heap Blocks: lossy&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;13380&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; flights_bi_airport_utc_offset_idx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;74&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;999&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;74&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;999&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;133800&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (airport_utc_offset &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;08:00:00&amp;#39;&lt;/span&gt;::interval)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Whether index key order matches storage order is critical. For example, non-sequentially stored extra key value data may be on &amp;ldquo;distant&amp;rdquo; pages, requiring extra IO to access distant data pages. Worst case, it may scan the entire table:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/46ee8f7372ff.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://www.pgcon.org/2016/schedule/attachments/434_Index-internals-PGCon2016.pdf）&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BRIN suitable scenarios:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;BRIN indexes only suit data where index key order is highly consistent with storage order. Check the column&amp;rsquo;s correlation in pg_stats — should approach 1 (maybe -1 also works?), typically auto-increment primary keys and timestamp columns&lt;/li&gt;
&lt;li&gt;Nearly no update scenarios. Updates may reduce correlation&lt;/li&gt;
&lt;li&gt;BRIN indexes generally suit very large data, especially TB-scale and beyond&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/5967830" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/5967830&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;(7) RUM&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;RUM is an extension, not natively included in PG. RUM and GIN indexes are similar except RUM additionally stores tsvector position information.&lt;/p&gt;
&lt;p&gt;Although GIN requires to_tsvector() (or direct tsvector) for tokenization, GIN doesn&amp;rsquo;t use the position information from to_tsvector(). For example, finding the distance between two tokens can&amp;rsquo;t be done with GIN — only via raw to_tsvector() data. RUM handles this.&lt;/p&gt;
&lt;p&gt;RUM indexes attach token position information alongside ctids, compared to GIN:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9c5cdfb1d385.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://postgrespro.com/blog/pgsql/4262305）&lt;/p&gt;
&lt;p&gt;RUM, similar to GIN, suits full-text indexing, with additional capabilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Distance operators (e.g., &amp;lt;=&amp;gt;) for distance calculation&lt;/li&gt;
&lt;li&gt;Position-based sorting&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/4262305" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/4262305&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;(8) BLOOM&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A Bloom filter quickly determines whether an element is in a set. Bloom filters can have false positives — &amp;ldquo;in set&amp;rdquo; isn&amp;rsquo;t guaranteed true, but &amp;ldquo;not in set&amp;rdquo; is guaranteed true. BLOOM indexes are also non-tree, flat structures (requiring recheck like BRIN).&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/bf06b10cd015.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://en.wikipedia.org/wiki/Bloom_filter）&lt;/p&gt;
&lt;p&gt;Bloom indexes can index many columns. Similar to hash indexes, but unlike hash indexes, they can specify hashed fields and combine them, with total length limited by the &lt;code&gt;length&lt;/code&gt; parameter. Because of the segmented hashing and truncation, false positives exist. Shorter length means higher false positive probability (max length 4096 bits).&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; ... &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; bloom(...) &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;length&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;..., col1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;..., col2&lt;span style="color:#f92672"&gt;=&lt;/span&gt;..., ...);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/93a3ccefbd2d.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://postgrespro.com/blog/pgsql/5967832）&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/bloom.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/bloom.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/5967832" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/5967832&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Summary:&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Index Type&lt;/th&gt;
 &lt;th&gt;Structure&lt;/th&gt;
 &lt;th&gt;Operators&lt;/th&gt;
 &lt;th&gt;Access Complexity&lt;/th&gt;
 &lt;th&gt;Native?&lt;/th&gt;
 &lt;th&gt;Ordered?&lt;/th&gt;
 &lt;th&gt;Accurate?&lt;/th&gt;
 &lt;th&gt;Applicable Scenarios&lt;/th&gt;
 &lt;th&gt;Advantages&lt;/th&gt;
 &lt;th&gt;Disadvantages&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;btree&lt;/td&gt;
 &lt;td&gt;btree; branch stores key ranges, leaf nodes store keys and ctids, generally ascending&lt;/td&gt;
 &lt;td&gt;&amp;gt;=, =, IS NULL etc. common operators; leftmost prefix rule&lt;/td&gt;
 &lt;td&gt;O(logN)&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;High selectivity scenarios; not suitable for too-large data&lt;/td&gt;
 &lt;td&gt;Fits most scenarios; no extra sorting needed&lt;/td&gt;
 &lt;td&gt;Large key values make index very large; index fragmentation/splitting (HOT mitigates)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;hash&lt;/td&gt;
 &lt;td&gt;Builds hash buckets; different hash values point to different rows&lt;/td&gt;
 &lt;td&gt;Only =&lt;/td&gt;
 &lt;td&gt;O(1)&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Only = condition scenarios; large key values&lt;/td&gt;
 &lt;td&gt;Generally small; fast access&lt;/td&gt;
 &lt;td&gt;Very narrow use case&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GiST&lt;/td&gt;
 &lt;td&gt;Index framework; R-TREE, RD-TREE; groups addresses at different layers for precision&lt;/td&gt;
 &lt;td&gt;Spatial operators: &lt;code&gt;&amp;lt;-&amp;gt;&lt;/code&gt; distance, &lt;code&gt;&amp;lt;&amp;lt;&lt;/code&gt; left-of, &lt;code&gt;@&amp;gt;&lt;/code&gt; contains etc.&lt;/td&gt;
 &lt;td&gt;Layer height&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes (supports KNN)&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;GIS; KNN; frequently updated full-text index&lt;/td&gt;
 &lt;td&gt;GIS, multi-dimensional data&lt;/td&gt;
 &lt;td&gt;Special-case scenarios&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;sp-GiST/Q-tree&lt;/td&gt;
 &lt;td&gt;(sp-GiST is framework; index excludes overlapping data) Q-tree: each node has 4 internal nodes&lt;/td&gt;
 &lt;td&gt;Spatial operators: up/down/left/right, equality, contains&lt;/td&gt;
 &lt;td&gt;Layer height&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;GIS&lt;/td&gt;
 &lt;td&gt;GIS&lt;/td&gt;
 &lt;td&gt;GIS&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;sp-GiST/k-d tree&lt;/td&gt;
 &lt;td&gt;k-d tree: splits multi-dimensional space at nodes until no further split&lt;/td&gt;
 &lt;td&gt;Spatial operators&lt;/td&gt;
 &lt;td&gt;Min O(k), avg O(logN), max O(N/2)&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;GIS; multi-dimensional data&lt;/td&gt;
 &lt;td&gt;GIS, multi-dimensional data&lt;/td&gt;
 &lt;td&gt;Special-case scenarios&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;sp-GiST/radix-tree&lt;/td&gt;
 &lt;td&gt;radix-tree: each child synthesizes its parent&lt;/td&gt;
 &lt;td&gt;Common operators: =, &amp;gt;, ~ etc.&lt;/td&gt;
 &lt;td&gt;Min O(1), max O(N)&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Scenarios without common data&lt;/td&gt;
 &lt;td&gt;Supports common operators beyond GIST&lt;/td&gt;
 &lt;td&gt;Limited scenarios; can be very slow&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GIN&lt;/td&gt;
 &lt;td&gt;Index framework; similar to btree: branch stores token ranges, leaf stores tokens and ctids; one token pointing to multiple ctids may have sub-posting-tree; fast update enabled adds linked-list space for incremental data&lt;/td&gt;
 &lt;td&gt;Operators vary slightly by data type; generally @@ contains&lt;/td&gt;
 &lt;td&gt;Related to text length/token repetition; approx O(logN)&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;No (branches ordered but no token position info)&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Key-contains-multiple-values scenarios: array, full text, JSON, many columns&lt;/td&gt;
 &lt;td&gt;Best choice for multi-value key scenarios&lt;/td&gt;
 &lt;td&gt;Updates need proper strategy&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BRIN&lt;/td&gt;
 &lt;td&gt;Non-tree: groups data pages by range; rev index layer stores only key ranges and ctids&lt;/td&gt;
 &lt;td&gt;Common operators: &amp;lt; &amp;lt;= = &amp;gt;= &amp;gt;&lt;/td&gt;
 &lt;td&gt;Page lookup O(1); data return O(N), N=recheck rows&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Not strictly ordered, only suits ordered data&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;Sequential storage (time-series, auto-increment); very large tables; nearly no updates; range queries&lt;/td&gt;
 &lt;td&gt;Very small index&lt;/td&gt;
 &lt;td&gt;Extremely demanding on correlation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;RUM&lt;/td&gt;
 &lt;td&gt;Similar to GIN, but additionally stores token position info&lt;/td&gt;
 &lt;td&gt;Includes GIN operators plus position operators&lt;/td&gt;
 &lt;td&gt;Related to text length/token repetition; approx O(logN)&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;Yes (supports KNN lookup)&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Key-contains-multiple-values scenarios; suitable for KNN&lt;/td&gt;
 &lt;td&gt;Stores position info beyond GIN&lt;/td&gt;
 &lt;td&gt;Requires extension installation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLOOM&lt;/td&gt;
 &lt;td&gt;Each field hashed and truncated; non-tree, bitmap filtering&lt;/td&gt;
 &lt;td&gt;Common operators: &amp;lt; &amp;lt;= = &amp;gt;= &amp;gt;&lt;/td&gt;
 &lt;td&gt;Miss: O(1); hit: O(N), N=recheck rows&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;Suitable for miss scenarios&lt;/td&gt;
 &lt;td&gt;Can be very fast&lt;/td&gt;
 &lt;td&gt;Can be very slow on recheck&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Additional index section references:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://it.badykov.com/blog/2020/03/21/postgresql-indexes/" target="_blank" rel="noreferrer"&gt;Types of PostgreSQL Indexes. Short and clear&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://leopard.in.ua/2015/04/13/postgresql-indexes" target="_blank" rel="noreferrer"&gt;https://leopard.in.ua/2015/04/13/postgresql-indexes&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://pic.huodongjia.com/ganhuodocs/2017-07-15/1500104265.79.pdf" target="_blank" rel="noreferrer"&gt;https://pic.huodongjia.com/ganhuodocs/2017-07-15/1500104265.79.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://developer.aliyun.com/article/698090?spm=a2c6h.12873639.article-detail.43.702e7149IBMYL9" target="_blank" rel="noreferrer"&gt;https://developer.aliyun.com/article/698090?spm=a2c6h.12873639.article-detail.43.702e7149IBMYL9&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://postgresql.us/events/pgopen2019/sessions/session/647/slides/45/look-it-up.pdf" target="_blank" rel="noreferrer"&gt;https://postgresql.us/events/pgopen2019/sessions/session/647/slides/45/look-it-up.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.pgcon.org/2016/schedule/attachments/434_Index-internals-PGCon2016.pdf" target="_blank" rel="noreferrer"&gt;https://www.pgcon.org/2016/schedule/attachments/434_Index-internals-PGCon2016.pdf&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;17. How Row Locks Are Implemented, Whether Stored in Shared Memory
 &lt;div id="17-how-row-locks-are-implemented-whether-stored-in-shared-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#17-how-row-locks-are-implemented-whether-stored-in-shared-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Row locks in PG are in the row header, not implemented in memory.&lt;/p&gt;
&lt;p&gt;(1) After t1 updates without committing, it acquires exclusive locks on relation and transactionid:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/16040258a95a.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;(2) t2 updating the same row gets blocked; this blocking is implemented via transactionid sharelock. t2 acquires both relation and tuple locks:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2cca36c19235.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d3b2e8a88a88.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;(3) t3 updating this row gets blocked via tuple exclusive lock:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9f0527a73a0a.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b5ea7c9fe3f1.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;In summary, &lt;strong&gt;PG row locks are implemented jointly via transactionid locks, relation locks, and tuple locks:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5160903bb82b.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;《postgresql-internals-14》&lt;/p&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/5968005" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/5968005&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;18. Differences Between Streaming Replication and Logical Replication, and Their Applicable Scenarios
 &lt;div id="18-differences-between-streaming-replication-and-logical-replication-and-their-applicable-scenarios" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#18-differences-between-streaming-replication-and-logical-replication-and-their-applicable-scenarios" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Streaming replication here generally refers to PG physical replication, synchronizing full WAL logs downstream for replay by the downstream PG instance at the physical block level:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d8149234af0e.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Logical replication requires logically decoding transaction information from WAL for relevant tables, ordering transactions via reorder buffer, then outputting data in the form determined by the output plugin. The downstream need not be a PG instance. Must have replication slots managing logical decoding, output plugin, reorder buffer, replication positions, etc., plus knowledge of replica identity, slot/sender status, and more:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2fde90f69b14.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Logical replication has many issues but is increasingly widely used and is a key focus area for PG community updates.&lt;/p&gt;
&lt;p&gt;For example (incomplete list):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;logical_decoding_work_mem is no longer hardcoded 4096 (changes); it&amp;rsquo;s now a configurable GUC parameter. Decoding spill issues are somewhat mitigated&lt;/li&gt;
&lt;li&gt;PG14+ supports streaming logical replication: uncommitted transactions can transmit data downstream; subsequent commit info determines whether to apply the changes&lt;/li&gt;
&lt;li&gt;Standby servers support replication slots; logical replication can be established on standbys&lt;/li&gt;
&lt;li&gt;Failover slots (in progress?)&lt;/li&gt;
&lt;li&gt;Many more updates&amp;hellip;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/120000817" target="_blank" rel="noreferrer"&gt;PG流复制详解&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/129291207" target="_blank" rel="noreferrer"&gt;pg内功修炼：逻辑复制&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;19. What Is Streaming Replication Conflict and Why It Occurs
 &lt;div id="19-what-is-streaming-replication-conflict-and-why-it-occurs" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#19-what-is-streaming-replication-conflict-and-why-it-occurs" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Cause of conflict:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The standby is running a query on a table (from application or manual connection). Meanwhile, the primary executes DROP TABLE, written to WAL and transmitted to the standby for replay. To ensure data consistency, PostgreSQL must rapidly replay WAL. The DROP TABLE and SELECT then conflict. Since the primary doesn&amp;rsquo;t know the standby&amp;rsquo;s transaction state, and the standby must stay consistent with the primary, &amp;ldquo;query conflict&amp;rdquo; occurs.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conflict scenarios:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Primary exclusive locks (including explicit LOCK commands and various DDL)&lt;/li&gt;
&lt;li&gt;Primary vacuum cleaning dead tuples — if the standby is using those tuples, conflict arises&lt;/li&gt;
&lt;li&gt;Primary drops a tablespace that the standby query is using&lt;/li&gt;
&lt;li&gt;Primary drops a database that the standby is using&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Mitigating query conflicts (can&amp;rsquo;t fully resolve):&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;hot_standby_feedback&lt;/code&gt;: standby periodically notifies the primary of the minimum active transaction ID (xmin), preventing the primary vacuum from cleaning tuples older than the xmin value.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;max_standby_streaming_delay&lt;/code&gt;: standby queries aren&amp;rsquo;t immediately canceled; instead wait for a period before throwing an error if not finished.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;max_standby_archive_delay&lt;/code&gt;: waiting time before canceling standby queries due to conflicts from processing archived WAL logs.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;vacuum_defer_cleanup_age&lt;/code&gt;: specifies how many transactions vacuum delays dead tuple cleanup by; i.e., vacuum and vacuum full won&amp;rsquo;t immediately clean just-deleted tuples.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/120000817" target="_blank" rel="noreferrer"&gt;PG流复制详解&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;20. PostgreSQL Permission System Overview
 &lt;div id="20-postgresql-permission-system-overview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#20-postgresql-permission-system-overview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b45d38154897.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Hard to summarize comprehensively; it&amp;rsquo;s somewhat complex. Key points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Permission access requires each layer to be &amp;ldquo;open&amp;rdquo;; none can be missing&lt;/li&gt;
&lt;li&gt;Best to separate read-only/read-write/owner users&lt;/li&gt;
&lt;li&gt;Read-only and read-write permissions can be managed via roles&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/jQP36rXZb4sgA71AaIJ-Sw" target="_blank" rel="noreferrer"&gt;PostgreSQL学徒:又被权限搞晕了？拿捏！&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;21. Common High Availability Solutions, Selection Criteria, Pros and Cons
 &lt;div id="21-common-high-availability-solutions-selection-criteria-pros-and-cons" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#21-common-high-availability-solutions-selection-criteria-pros-and-cons" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;HA selection considerations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sync mode choice, availability zones, cross-region multi-active&lt;/li&gt;
&lt;li&gt;Switchover, failover&lt;/li&gt;
&lt;li&gt;Load balancing, read/write separation&lt;/li&gt;
&lt;li&gt;Host, database, and application-level HA&lt;/li&gt;
&lt;li&gt;VIP switching, connection string HA, connection switching&lt;/li&gt;
&lt;li&gt;Solving single point of failure or split-brain; election mechanisms&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Below are some known architectures:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pgpool-II+watchdog&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/748bd7fc3712.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://www.pgpool.net/docs/latest/en/html/example-cluster.html）&lt;/p&gt;
&lt;p&gt;Pros: automatic failover, read/write separation, load balancing, watchdog election
Cons: complex configuration, pgpool doesn&amp;rsquo;t fully support all PG features, pgpool performance overhead, depends on watchdog election&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;patroni+etcd&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ed1ce367a7b8.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Pros: GUI (patroni), automatic failover, majority election
Cons: learning curve, doesn&amp;rsquo;t support other databases (patroni)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;patroni+pgbouncer+haproxy+etcd&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e7604c9266a6.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://www.percona.com/sites/default/files/eBook-PostgreSQL-High-Availability.pdf）&lt;/p&gt;
&lt;p&gt;Pros: open-source stack: haproxy for load balancing, pgbouncer for connection pooling, patroni for cluster management, etcd for election
Cons: very complex configuration&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ping An Financial Cloud rasesql architecture&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/78b4331a5822.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://www.ocftcloud.com/ssr/help/database/RASESQL/intro.Architecture）&lt;/p&gt;
&lt;p&gt;Pros: failover support, simple architecture
Cons: same-city remote can&amp;rsquo;t directly read-only access, higher resource usage, no election (?)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Alibaba Cloud Polar-X&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b3c6ace6a20f.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/3578c8002447.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（&lt;a href="https://ucc-private-download.oss-cn-beijing.aliyuncs.com/ab3f233b4a4c405986b2a8196cb53b47.pdf?Expires=1708410598&amp;amp;OSSAccessKeyId=LTAIvsP3ECkg4Nm9&amp;amp;Signature=O9UIudjtFyMmQW4eZf2BlClhVDk%3D" target="_blank" rel="noreferrer"&gt;PolarDB for PostgreSQL 三节点功能介绍&lt;/a&gt;）&lt;/p&gt;
&lt;p&gt;Pros: read/write separation, can add non-voting nodes, failover, logger nodes participate in election/data flow/backup
Cons: &amp;hellip;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Google Cloud PG&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;Three architecture options:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b85525616eeb.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Google Cloud Native Architecture (MIG):&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0cc376b9b922.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Pros: three options to choose from, well-documented! (the other two derive from open-source architectures with similar pros/cons; MIG cloud-native approach described below)
MIG advantages: doesn&amp;rsquo;t depend on PG native HA; uses Regional persistent disk for data HA. Primary zone network isolation; disk can be attached to zone B in the same region (within 1 minute).
MIG disadvantages: no read replicas; only within-region failover (no multi-region deployment)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Aurora for PG&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6c3c996ceeb5.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Pros: simple architecture, recovered primary node auto-joins cluster, multi-region deployment, standby readable
Cons: (seemingly) no election mechanism; docs heavy on text, light on diagrams&lt;/p&gt;
&lt;p&gt;崔健：PostgreSQL的高可以架构设计与实践&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.pgpool.net/docs/latest/en/html/example-cluster.html" target="_blank" rel="noreferrer"&gt;https://www.pgpool.net/docs/latest/en/html/example-cluster.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.postgres.cn/downfiles/pgconf_2018/PostgresChina2018_%E6%B1%AA%E6%B4%8B_PG%E4%B9%8B%E9%AB%98%E5%8F%AF%E7%94%A8%E7%89%B9%E6%80%A7%E3%80%81%E5%B7%A5%E5%85%B7%E5%8F%8A%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1.pdf" target="_blank" rel="noreferrer"&gt;汪总： Postgresql 高可用&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.tencent.com/developer/article/1185379" target="_blank" rel="noreferrer"&gt;使用Patroni和HAProxy创建高度可用的PostgreSQL集群&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.percona.com/sites/default/files/eBook-PostgreSQL-High-Availability.pdf" target="_blank" rel="noreferrer"&gt;https://www.percona.com/sites/default/files/eBook-PostgreSQL-High-Availability.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://ucc-private-download.oss-cn-beijing.aliyuncs.com/ab3f233b4a4c405986b2a8196cb53b47.pdf?Expires=1708410598&amp;amp;OSSAccessKeyId=LTAIvsP3ECkg4Nm9&amp;amp;Signature=O9UIudjtFyMmQW4eZf2BlClhVDk%3D" target="_blank" rel="noreferrer"&gt;PolarDB for PostgreSQL 三节点功能介绍&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/architecture/architectures-high-availability-postgresql-clusters-compute-engine" target="_blank" rel="noreferrer"&gt;https://cloud.google.com/architecture/architectures-high-availability-postgresql-clusters-compute-engine&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html" target="_blank" rel="noreferrer"&gt;https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;22. Five Levels of synchronous_commit; Why Standby Queries Can&amp;rsquo;t Immediately See Primary Inserts
 &lt;div id="22-five-levels-of-synchronous_commit-why-standby-queries-cant-immediately-see-primary-inserts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#22-five-levels-of-synchronous_commit-why-standby-queries-cant-immediately-see-primary-inserts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f2ec64d6d8a4.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/120000817" target="_blank" rel="noreferrer"&gt;PG流复制详解&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;23. Transaction ID Wraparound Causes and Maintenance Optimization
 &lt;div id="23-transaction-id-wraparound-causes-and-maintenance-optimization" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#23-transaction-id-wraparound-causes-and-maintenance-optimization" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Why transaction ID wraparound exists:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Every non-query transaction consumes a transaction ID. Query transactions consume virtual transaction IDs (VXID), which are locally counted. Though VXID has wraparound issues, session restart resets VXID counting, so it&amp;rsquo;s rarely problematic.&lt;/p&gt;
&lt;p&gt;However, transaction IDs have an upper limit. &lt;code&gt;TransactionId&lt;/code&gt; is a 32-bit unsigned integer, storing &lt;code&gt;2^32=4294967296&lt;/code&gt; — about 4.2 billion transactions. At this point, transaction IDs must wrap around to the initial state, which is why transaction IDs form a ring.&lt;/p&gt;
&lt;p&gt;Due to visibility rules, the 4.2 billion transactions must be split in half: one half represents the future, the other the past. The difference between max and min transactions in a PG instance cannot exceed 2.1 billion — hence the 2.1 billion transaction limit.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/bbce62f757b4.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://www.interdb.jp/pg/pgsql05/01.html）&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Transaction ID freezing:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Due to visibility rules, if a visible row (e.g., xid=100) differs from the latest transaction by more than 2.1 billion, it becomes invisible:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/57a0de81e82c.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（Forgot the source; look it up）&lt;/p&gt;
&lt;p&gt;To solve this, the transaction ID freezing mechanism was introduced. Freezing sets the xmin of overly old tuples to FrozenXID=2, older than all normal transactions. That is, txid=2 is visible to all normal transactions (txid&amp;gt;=3). In version 9.4+, t_infomask&amp;rsquo;s xmin_frozen flag indicates frozen tuples rather than rewriting t_xmin to 2.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Lazy mode:&lt;/strong&gt; The VM file was originally designed to reduce vacuum overhead by letting vacuum skip pages with no dead tuples (all-visible). Later (pg9.4), the freeze process was enhanced so lazy mode freezing can also skip all-visible pages during vacuum.&lt;/p&gt;
&lt;p&gt;Lazy mode freeze trigger: triggered alongside vacuum operation (seems to have no independent trigger condition???)&lt;/p&gt;
&lt;p&gt;Lazy mode freeze which tuples: except pages marked all-visible in VM that get skipped, freezes tuples whose xmin-to-active-transaction-ID (actually oldestxmin) gap exceeds &lt;code&gt;vacuum_freeze_min_age&lt;/code&gt; (default 50M), marking them xmin_frozen. In the diagram below, tuple 9&amp;rsquo;s xmin=3000 won&amp;rsquo;t be frozen.&lt;/p&gt;
&lt;p&gt;Lazy mode is more of a vacuum side-effect: since we&amp;rsquo;re already concurrently vacuum scanning and cleaning dead tuples with pages already scanned, we might as well freeze eligible tuples.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/47912e7b0750.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Eager mode:&lt;/strong&gt; Lazy mode has a problem: it works alongside vacuum, skipping pages with no dead tuples (all-visible). If a page contains only live tuples (all-visible but not all-frozen) with very old xmin values, lazy mode alone can&amp;rsquo;t freeze them. So eager mode is needed: skip pages already marked all-frozen in VM and freeze the rest. In real scenarios, eager mode is typically the one running periodically and requiring attention: &lt;strong&gt;even if only one page in a table has tuples that are all inserts (even just one static page), eager mode is needed&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Eager mode freeze triggers:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Vacuum_freeze_table_age&lt;/code&gt; for vacuum operations: when the &lt;strong&gt;database-level&lt;/strong&gt; minimum xmin (actually &lt;code&gt;pg_database.datfrozenxid&lt;/code&gt;, also the minimum of all &lt;code&gt;pg_class.relfrozenxid&lt;/code&gt; in that database) and the active transaction ID (actually oldestxmin) gap exceeds &lt;code&gt;Vacuum_freeze_table_age&lt;/code&gt; (default 150M), &lt;strong&gt;vacuum&lt;/strong&gt; triggers eager mode freezing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;autovacuum_freeze_max_age&lt;/code&gt; for autovacuum: whether lazy mode or eager mode &lt;code&gt;Vacuum_freeze_table_age&lt;/code&gt;, vacuum must first be triggered. Relying solely on vacuum&amp;rsquo;s own trigger conditions for freezing is unreliable; a freeze-specific deadline parameter is needed: &lt;code&gt;autovacuum_freeze_max_age&lt;/code&gt;. When tuple age exceeds &lt;code&gt;autovacuum_freeze_max_age&lt;/code&gt; (200M), autovacuum is force-triggered for freezing. Even if autovacuum is disabled, this deadline-triggered freeze still works.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Eager mode freeze which tuples: similar to lazy mode, except for all-frozen pages (lazy uses all-visible — different), freezes tuples whose xmin-to-active-transaction-ID gap exceeds &lt;code&gt;vacuum_freeze_min_age&lt;/code&gt; (default 50M). In the diagram, tuple 11 is not frozen.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d24f548bb484.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;vacuum freeze command:&lt;/strong&gt; &lt;code&gt;VACUUM FREEZE&lt;/code&gt; is equivalent to setting vacuum_freeze_min_age and vacuum_freeze_table_age to 0, performing eager mode freezing for all inactive xmin tuples.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;vacuum_failsafe_age:&lt;/strong&gt; Since large table vacuum operations are very slow, freeze may not finish before transaction ID wraparound occurs. Because freeze is done by the vacuum process, and vacuum has many other operations and parameter settings, to accelerate freeze, cost-based vacuuming, buffer strategy, and index vacuuming are all ignored. Parameter default is 1.6B; actually, during vacuum the effective value is no lower than autovacuum_freeze_max_age * 105%.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CLOG may also be updated:&lt;/strong&gt; Additionally, if freezing updates pg_database.datfrozenxid, unnecessary CLOG is also cleaned. CLOG records transaction status for determining &amp;ldquo;relatively new&amp;rdquo; transaction and tuple visibility. If a database&amp;rsquo;s frozenxid has been advanced recently, meaning those &amp;ldquo;old&amp;rdquo; tuples have been marked as frozen — always visible — then &amp;ldquo;old&amp;rdquo; transaction status info in CLOG can be discarded.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/1e864c9bc4a1.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Maintenance optimization:&lt;/strong&gt; (summarized from Can Zong&amp;rsquo;s summary)&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Monitor pg_database.frozenxid in production. When approaching trigger values, proactively run VACUUM FREEZE during low-traffic windows rather than waiting for passive database triggers.&lt;/li&gt;
&lt;li&gt;Partition tables; overly large tables cause long prevent-wraparound operations&lt;/li&gt;
&lt;li&gt;Set different vacuum ages for large tables: ALTER TABLE test SET (autovacuum_freeze_max_age=xxxx);&lt;/li&gt;
&lt;li&gt;User-scheduled freeze: during low-traffic windows, VACUUM FREEZE large, aged tables&lt;/li&gt;
&lt;li&gt;Watch for freeze-blocking scenarios: long transactions, replication slots, hot_standby_feedback, pg_dump, cursors, orphan transactions&lt;/li&gt;
&lt;li&gt;Set sufficient worker processes to avoid vacuum scenarios queuing&lt;/li&gt;
&lt;li&gt;If load is a concern, consider enabling cost-based vacuuming (vacuum_cost_delay etc.)&lt;/li&gt;
&lt;li&gt;autovacuum_freeze_max_age should exceed vacuum_freeze_table_age to leave room for manual vacuum. Official recommendation: vacuum_freeze_table_age = 0.95 * autovacuum_freeze_max_age; if vacuum_freeze_table_age is below 0.95 * autovacuum_freeze_max_age, vacuum still takes 0.95 * autovacuum_freeze_max_age.&lt;/li&gt;
&lt;li&gt;vacuum_failsafe_age: PG14+ set reasonable vacuum_failsafe_age to accelerate large table freeze and prevent wraparound; should exceed autovacuum_freeze_max_age * 105%.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/16/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/16/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/p6aFhghpDEGu6lIBD8A5Yw" target="_blank" rel="noreferrer"&gt;深入理解PostgreSQL冻结炸弹&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/130782577" target="_blank" rel="noreferrer"&gt;pg事务：事务ID&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;24. Vacuum / Autovacuum Functions and Tuning
 &lt;div id="24-vacuum--autovacuum-functions-and-tuning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#24-vacuum--autovacuum-functions-and-tuning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Functions:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Clean up &amp;ldquo;dead tuples&amp;rdquo; left by UPDATE or DELETE operations&lt;/li&gt;
&lt;li&gt;Track available space in table blocks, update free space map&lt;/li&gt;
&lt;li&gt;Update visibility map needed for index-only scans&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Freeze&amp;rdquo; rows in tables to prevent transaction ID wraparound&lt;/li&gt;
&lt;li&gt;Periodically ANALYZE to update statistics&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Tuning:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Set sufficient worker processes to avoid vacuum queuing&lt;/li&gt;
&lt;li&gt;Increase maintenance_work_mem (or autovacuum_work_mem)&lt;/li&gt;
&lt;li&gt;Watch for vacuum-blocking scenarios: long transactions, replication slots, hot_standby_feedback, pg_dump, cursors, orphan transactions&lt;/li&gt;
&lt;li&gt;For special tables (business-sensitive, large), set separate autovacuum trigger thresholds (threshold, fillfactor; insert threshold, fillfactor): dead tuple cleanup threshold, stats update threshold, wraparound prevention threshold&lt;/li&gt;
&lt;li&gt;For special tables, disable per-table autovacuum and run vacuum during off-peak hours for dead tuple cleanup, statistics, and wraparound&lt;/li&gt;
&lt;li&gt;If business load is a concern, enable cost-based vacuuming with sleep at thresholds&lt;/li&gt;
&lt;li&gt;Partition tables to avoid vacuum running endlessly or restarting immediately after finishing&lt;/li&gt;
&lt;li&gt;Avoid VACUUM FULL (8-level lock). Use logical replication + rename or pg_repack for table/index bloat handling, improving efficiency and reclaiming space&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;25. Function Volatility Categories and Why Functions Need EXECUTE
 &lt;div id="25-function-volatility-categories-and-why-functions-need-execute" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#25-function-volatility-categories-and-why-functions-need-execute" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;VOLATILE (unstable, default):&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Can do anything, including modifying the database&lt;/li&gt;
&lt;li&gt;Within the same transaction, even with identical parameters, may return different results&lt;/li&gt;
&lt;li&gt;Obtains a snapshot for &lt;strong&gt;each query execution&lt;/strong&gt; within the function, so even identical interactive queries within the same function may produce different results due to changing visible data&lt;/li&gt;
&lt;li&gt;Since recalculation is needed each time, the optimizer can&amp;rsquo;t pre-estimate; performance may be poor&lt;/li&gt;
&lt;li&gt;Function indexes not supported&lt;/li&gt;
&lt;li&gt;Typical functions: timeofday(), random(), all modifying functions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;STABLE:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cannot modify the database&lt;/li&gt;
&lt;li&gt;Within the same transaction, identical parameters return identical results. Snapshot obtained at function start; internal queries don&amp;rsquo;t re-obtain; identical interactive queries within the function produce consistent results&lt;/li&gt;
&lt;li&gt;Function indexes not supported&lt;/li&gt;
&lt;li&gt;Typical functions: current_timestamp family; regardless of how many times called within a transaction, only one value&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;IMMUTABLE (very stable):&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cannot modify the database&lt;/li&gt;
&lt;li&gt;Given identical parameters, always returns identical results. Snapshot acquisition principle same as STABLE&lt;/li&gt;
&lt;li&gt;Key difference from STABLE: IMMUTABLE not only caches the plan but reuses this plan in subsequent executions&lt;/li&gt;
&lt;li&gt;Function indexes supported&lt;/li&gt;
&lt;li&gt;Some database-parameter-dependent functions shouldn&amp;rsquo;t be marked IMMUTABLE, e.g., timezone-related functions should be STABLE&lt;/li&gt;
&lt;li&gt;Typical function: calculating 1+2&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Why functions need EXECUTE:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;PREPARE: parsed, analyzed, and rewritten&lt;/p&gt;
&lt;p&gt;EXECUTE: planned and executed&lt;/p&gt;
&lt;p&gt;Forcing SQL hard parsing: prevents SQL from using incorrect execution plans due to data skew.&lt;/p&gt;
&lt;p&gt;Unlike plain SQL, plpgsql defaults to Plan Caching, automatically executing SQL as PREPARE, attempting to generate and cache generic plans for soft parsing. However, with data skew, cached execution plans may be inefficient and unacceptable for core business. In such cases, consider using EXECUTE statements to force per-variable-value execution plans, improving accuracy.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/128885660" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/128885660&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/16/xfunc-volatility.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/16/xfunc-volatility.html&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;26. Why Use CREATE INDEX CONCURRENTLY and Its Hazards
 &lt;div id="26-why-use-create-index-concurrently-and-its-hazards" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#26-why-use-create-index-concurrently-and-its-hazards" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Why CIC:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;CREATE INDEX requires a ShareLock, which conflicts with DML&amp;rsquo;s RowExclusiveLock. So online business shouldn&amp;rsquo;t directly use CREATE INDEX. CIC uses ShareUpdateExclusiveLock, which doesn&amp;rsquo;t conflict with DML locks, so CIC is recommended for index creation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CIC process:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Insert index metadata into system catalogs (pg_class, pg_index), then open two transactions for two scans&lt;/li&gt;
&lt;li&gt;Open transaction 1, get snapshot1&lt;/li&gt;
&lt;li&gt;Before scanning table, wait for all transactions that modified the table (insert/delete/update) to finish&lt;/li&gt;
&lt;li&gt;Scan table and build index&lt;/li&gt;
&lt;li&gt;End transaction 1&lt;/li&gt;
&lt;li&gt;Open transaction 2, get snapshot2&lt;/li&gt;
&lt;li&gt;Before second scan, wait for all transactions that modified the table to finish&lt;/li&gt;
&lt;li&gt;DML on the table from transactions started after snapshot2 will update this index&lt;/li&gt;
&lt;li&gt;Second table scan, update index (version numbers from tuples allow identifying records changed between snapshot1 and snapshot2, merging them into the index)&lt;/li&gt;
&lt;li&gt;After index update, wait for transactions holding snapshots that started before transaction 2 to finish&lt;/li&gt;
&lt;li&gt;End index creation. Index becomes visible.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;CIC issues:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Opens two transactions sequentially, scanning the table one extra time vs CREATE INDEX&lt;/li&gt;
&lt;li&gt;Must wait for long transactions to finish before scanning can begin&lt;/li&gt;
&lt;li&gt;CIC-created indexes may become invalid
&lt;ul&gt;
&lt;li&gt;CIC interrupted abnormally leaves an invalid index&lt;/li&gt;
&lt;li&gt;During CIC unique index creation, inserted/updated data violating unique constraints also causes CIC failure leaving an invalid index&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Invalid indexes still get updated by DML&lt;/li&gt;
&lt;li&gt;Partition parent tables don&amp;rsquo;t support CIC index creation; create indexes with CIC on child partitions one by one, then create the index on the parent with ONLY&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/Sayutoyj7QmV5Nl8EFlwiQ" target="_blank" rel="noreferrer"&gt;学徒 深度剖析CIC&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;27. HOT Principle
 &lt;div id="27-hot-principle" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#27-hot-principle" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;HOT:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Without HOT, every tuple update would update indexes. Below, one additional updated tuple adds one index entry, and the old index entry points to the dead tuple. This causes index update, index space, and index vacuum pressure.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4a5a7f3ac437.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;With HOT, in-page updates only update the tuple, not the index:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e618933424af.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;HOT tuples correspond to HEAP_HOT_UPDATED and HEAP_ONLY_TUPLE bits in infomask:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tt(a int);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxtt &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tt(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tt &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tt &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- execute multiple times
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tt; &lt;span style="color:#75715e"&gt;-- after update, run a visibility check to write remaining clog commit info to tuple header
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0:LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid, raw_flags, combined_flags
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;tt&amp;#39;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; t_infomask &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; t_infomask2 &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+-----------+--------+-----------------------------------------------------------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_HOT_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;lp(line pointer)=1&amp;rsquo;s tuple points to row 2 via ctid(0,2); row 2 points to row 3&amp;hellip; ultimately to row 5. ctid forms a chain pointing to the final data row. Dead tuples all carry HEAP_HOT_UPDATED, indicating the tuple is an updated row on the HOT chain; the chain tail has HEAP_ONLY_TUPLE, marking the end of the HOT chain.&lt;/p&gt;
&lt;p&gt;With HOT, vacuum only cleans dead tuples within the page without updating indexes:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c220fe22e28a.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;vacuum&lt;/span&gt; tt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;VACUUM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0:LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid, raw_flags, combined_flags
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;tt&amp;#39;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; t_infomask &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; t_infomask2 &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+-----------+--------+----------------------------------------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After vacuum, dead tuples are cleaned.&lt;/p&gt;
&lt;p&gt;On subsequent updates, a new HOT chain begins:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tt &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tt &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0:LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid, raw_flags, combined_flags
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;tt&amp;#39;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; t_infomask &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; t_infomask2 &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+-----------+--------+-----------------------------------------------------------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Why doesn&amp;rsquo;t the new HOT chain start from lp1? Because lp1 is already occupied — the index still points to lp1.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idxtt&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-------+-------------------------+------+-------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;htid (0,1) is page 0, lp 1. Vacuum only cleaned the data page; the index was not updated. Vacuum only cleaned dead tuples and the middle of the HOT chain; HOT chain head and tail ctids were untouched.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;INDEX ONLY SCAN:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Index-only scan is a common and efficient scan method across databases: it returns results by accessing only index pages without touching data pages. However, this is problematic in PG because visibility information is stored in data page headers, not index pages. Accessing only the index can&amp;rsquo;t support MVCC in principle.&lt;/p&gt;
&lt;p&gt;The VM file not only supports vacuum skipping all-visible pages but also supports INDEX ONLY SCAN for visibility determination on all-visible pages:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b2b9809f61d7.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Reference: interdb&lt;/p&gt;

&lt;h3 class="relative group"&gt;28. Does PostgreSQL Have Lock Escalation?
 &lt;div id="28-does-postgresql-have-lock-escalation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#28-does-postgresql-have-lock-escalation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Basically no.&lt;/p&gt;
&lt;p&gt;Only Predicate lock has escalation. Predicate lock is used when serializable isolation is needed, intended to lock predicates and prevent data anomalies to achieve serializability. In PG, this corresponds to SIReadLock.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Predicate lock&amp;rsquo;s finest granularity is locking rows within a range&lt;/li&gt;
&lt;li&gt;When row count exceeds a threshold, lock the corresponding page&lt;/li&gt;
&lt;li&gt;When page count exceeds a threshold, lock the corresponding table&lt;/li&gt;
&lt;li&gt;Predicate lock has only 3 lock levels: row, page, table&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/5968020" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/5968020&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;29. Replication Slot Functions and Hazards
 &lt;div id="29-replication-slot-functions-and-hazards" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#29-replication-slot-functions-and-hazards" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;For physical replication, replication slots aren&amp;rsquo;t strictly necessary; hot_standby_feedback and other parameters can manage WAL. With replication slots, those parameters become unnecessary — slots manage WAL logs.&lt;/p&gt;
&lt;p&gt;For logical replication, replication slots are mandatory; one logical replication link corresponds to one slot. For logical replication, slots manage not only WAL logs but also logical decoding, output plugin, decoding/sending positions (LSN), allowing retransmission of decoded logs after replication interruption.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a3a4118829be.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Replication slot hazards:&lt;/p&gt;
&lt;p&gt;Actually, replication slots have no inherent hazards. Their primary function is simplifying WAL log management. Without slots, you still need WAL management strategies. The PG community recommends using slots. Just note: always clean up unused slots to prevent them holding old positions that block WAL cleanup, filling the disk. Additionally, DBAs shouldn&amp;rsquo;t casually drop slots — once dropped, position info is lost, and downstream links may need data reinitialization and resynchronization. Better to confirm whether the replication link can restart syncing.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/129291207" target="_blank" rel="noreferrer"&gt;pg内功修炼：逻辑复制&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;30. Why Deadlocks Occur and Deadlock Detection Mechanism
 &lt;div id="30-why-deadlocks-occur-and-deadlock-detection-mechanism" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#30-why-deadlocks-occur-and-deadlock-detection-mechanism" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a158c01929e8.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Simplest case: transaction T1 holds resource 1, transaction T2 holds resource 2. If T1 tries to acquire resource 2 and T2 tries to acquire resource 1, a deadlock forms. Without management, deadlocks can wait indefinitely, so all DBMS have deadlock detection. Deadlocks usually indicate business logic issues. If no explicit cancellation of one transaction in the &amp;ldquo;ring&amp;rdquo; breaks it, PG auto-detects deadlocks and force-terminates one transaction via the &lt;code&gt;deadlock_timeout&lt;/code&gt; parameter (default 1s); other transactions in the &amp;ldquo;ring&amp;rdquo; can continue.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/5968020" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/5968020&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;31. SQL Performance Troubleshooting Approaches
 &lt;div id="31-sql-performance-troubleshooting-approaches" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#31-sql-performance-troubleshooting-approaches" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2e1b3f17a0b2.png" alt="Insert image description here" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;32. Why Use Partitioned Tables, Advantages and Disadvantages
 &lt;div id="32-why-use-partitioned-tables-advantages-and-disadvantages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#32-why-use-partitioned-tables-advantages-and-disadvantages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Partitioned tables split table data into smaller physical fragments to improve performance, availability, and manageability, transparent to applications. Partitioned tables are a common optimization for large tables in relational databases. DBMS generally provide partition management, and applications can directly access partitioned tables without architecture changes — though good performance requires proper partition access patterns.&lt;/p&gt;
&lt;p&gt;PG natively supports declarative partitioning and inheritance partitioning. Common plugin-based implementations include pg_pathman. PG10 introduced declarative partitioning with many enhancements in subsequent versions (see PostgreSQL Partitioned Tables — History). PG12+ with declarative partitioning is recommended.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Advantages of partitioned tables:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;SQL performance improvement. In some scenarios, e.g., splitting large data into multiple partitions where SQL only queries one partition, partition pruning can dramatically improve performance&lt;/li&gt;
&lt;li&gt;Partitions work with indexes. Accessing one partition&amp;rsquo;s index is more efficient than accessing an unpartitioned large index&lt;/li&gt;
&lt;li&gt;Dropping a partition is more efficient than deleting many rows. Common in time-range partitioning: dropping an unused historical partition is very fast, while DELETE without partitions is slow and requires extra maintenance&lt;/li&gt;
&lt;li&gt;Faster vacuum. Vacuuming a large table for old version cleanup or statistics collection can be very slow; SQL problems may arise before vacuum finishes. With partitions, vacuum is much faster&lt;/li&gt;
&lt;li&gt;IO distribution. Different partitions can be placed on different paths/disks. Rarely used data can go on cheaper disks&lt;/li&gt;
&lt;li&gt;More maintenance techniques. Directly maintaining a huge table is very difficult (e.g., vacuuming an extremely large table has many issues), while partitioned table partitions can be vacuumed individually. Also, attach/detach, local indexes/constraints etc. can be flexibly used&lt;/li&gt;
&lt;li&gt;May enable partition-wise join or partition-wise aggregation features&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Disadvantages of partitioned tables:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In PG, partitions are also tables; too many tables cause slow parsing and large relcache metadata caching&lt;/li&gt;
&lt;li&gt;Too many tables may cause errors. Reference: &lt;a href="https://editor.csdn.net/md/?articleId=131497779" target="_blank" rel="noreferrer"&gt;较少的分区也报错too many range table entries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Even if partition count doesn&amp;rsquo;t error, without partition pruning during plan generation (may happen at execution), EXPLAIN output becomes very large, and logs become bloated with long plans&lt;/li&gt;
&lt;li&gt;Strange issues: &lt;a href="https://mp.weixin.qq.com/s?__biz=MzUyOTAyMzMyNg==&amp;amp;mid=2247489813&amp;amp;idx=1&amp;amp;sn=22360e2bfd40fc2d0caed0a9d825b1d4&amp;amp;chksm=fa663124cd11b832953e789127927ffa0d63d6c948ca8934d5317b8eaae6e71374041ec038f7&amp;amp;mpshare=1&amp;amp;srcid=0728JrXnHdxnfgRVzqosBNcv&amp;amp;sharer_sharetime=1690509489198&amp;amp;sharer_shareid=0412ea33e50b471b98d8859a5c431367&amp;amp;from=singlemessage&amp;amp;scene=1&amp;amp;subscene=10000&amp;amp;sessionid=1690509419&amp;amp;clicktime=1690509545&amp;amp;enterid=1690509545&amp;amp;ascene=1&amp;amp;fasttmpl_type=0&amp;amp;fasttmpl_fullversion=6785798-en_US-zip&amp;amp;fasttmpl_flag=0&amp;amp;realreporttime=1690509545257&amp;amp;devicetype=android-29&amp;amp;version=28002658&amp;amp;nettype=WIFI&amp;amp;abtest_cookie=AAACAA%3D%3D&amp;amp;lang=en&amp;amp;countrycode=CN&amp;amp;exportkey=n_ChQIAhIQCCtq2jm3UsFznlVjxFEOWBLaAQIE97dBBAEAAAAAABKTCFyWAsoAAAAOpnltbLcz9gKNyK89dVj0LyxnG1pA6NiO6PHIsQ0Hy2N7QRbizb9SHdquaFOpOqANqG8jLDcioswZyRnYknjG4bSqNIIKm%2BpRIlK%2FVJxuwolH2%2FQJKSLg4YjccDktYYscUDvYSfHFx1ScEXZkOkbVqrvbBCPy6Gh2GnzulFuuIU68afNtsoBdzZTqHYbL0BfsAUhsz1iGAfSep642UT2CBpWSHWJQvndnwhZxjJ6%2FWO%2FI%2FqwncggiVeDNiv4vwXhluDNn&amp;amp;pass_ticket=mrpzS3wggBDzL9Ua2FmX5v1rYh6zKOnQ4og6oKcKv0ZXRfNBSUpSkGdTAcfXqgDo&amp;amp;wx_header=3" target="_blank" rel="noreferrer"&gt;不同用户查看到不同的执行计划&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Major limitations of PG native partitioned tables:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No native automatic partition creation&lt;/li&gt;
&lt;li&gt;Only local indexes supported, no global indexes&lt;/li&gt;
&lt;li&gt;Primary key must include the partition key. PostgreSQL currently can only enforce uniqueness within individual partitions, hence this limitation. Oracle and MySQL don&amp;rsquo;t have this restriction&lt;/li&gt;
&lt;li&gt;Unique index must include the partition key (same reason as primary key)&lt;/li&gt;
&lt;li&gt;Cannot create global constraints (child tables inherit but can&amp;rsquo;t create table-level global constraints)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Partitioned table maintenance:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;New partitions without data: directly use PARTITION OF (8-level lock; just watch for long transactions)&lt;/li&gt;
&lt;li&gt;New partitions with data: use ATTACH (4-level lock, doesn&amp;rsquo;t block reads/writes) to add; if needed, pre-add partition constraints to reduce constraint check time. DETACH CONCURRENTLY (4-level lock) to remove partitions&lt;/li&gt;
&lt;li&gt;Note: ATTACH doesn&amp;rsquo;t auto-create indexes, constraints, defaults, or row-level triggers like PARTITION OF does; create them beforehand&lt;/li&gt;
&lt;li&gt;Partition parent table indexes don&amp;rsquo;t support CIC. Correct approach for partition index creation: 1) create ONLY on parent 2) create CONCURRENTLY on partitions 3) ATTACH all partition indexes to the parent; the index auto-marks as valid&lt;/li&gt;
&lt;li&gt;Increasing column length won&amp;rsquo;t rebuild indexes, EXCEPT for partitioned tables where it WILL rebuild indexes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/132525655" target="_blank" rel="noreferrer"&gt;PostgreSQL分区表&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;33. Soft Parsing vs Hard Parsing Concepts
 &lt;div id="33-soft-parsing-vs-hard-parsing-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#33-soft-parsing-vs-hard-parsing-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Hard parsing:&lt;/strong&gt; For a SQL statement, the optimizer must first perform lexical and syntax analysis, converting it into a query tree PG can understand, then rewrite and optimize it, generating an execution plan tree before the executor can execute. This full parsing process is called hard parsing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Soft parsing:&lt;/strong&gt; Obviously, performing such complex steps for every statement each time would be very inefficient. So PG caches SQL execution plans in process memory. When certain conditions are met, cached plans can be used directly, improving efficiency. This is soft parsing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PG bind-variable SQL parsing: the five-time mechanism:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The five-time mechanism prevents data skew from causing inefficient execution plans.&lt;/p&gt;
&lt;p&gt;First 5 executions: each generates an execution plan based on actual bound variables (called custom plans) — this is hard parsing.
6th execution: generates a generic execution plan (generic plan) and compares it with the previous 5 plans.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If not worse than the first 5: the 6th plan is fixed; subsequently, regardless of parameter changes, the SQL execution plan won&amp;rsquo;t change — this is soft parsing&lt;/li&gt;
&lt;li&gt;If worse than any of the first 5 plans: every subsequent execution regenerates the plan — all hard parsing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Forcing soft/hard parsing:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;PG 12 introduced the &lt;code&gt;force_custom_plan&lt;/code&gt; parameter with options:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;auto: default, uses the five-time mechanism&lt;/li&gt;
&lt;li&gt;force_custom_plan: always hard parse; suitable for SQL with data skew where performance and stability are critical&lt;/li&gt;
&lt;li&gt;force_generic_plan: always use generic plan; suitable for SQL without data skew or where performance/stability requirements are lower&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PG 14 added generic_plans and custom_plans columns to pg_prepared_statements, showing counts for both plan types. Since PG execution plans are only cached in-process, pg_prepared_statements only shows the current session&amp;rsquo;s SQL, not other sessions or global info.&lt;/p&gt;
&lt;p&gt;Five-time mechanism source code:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * choose_custom_plan: choose whether to use custom or generic plan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * This defines the policy followed by GetCachedPlan.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;choose_custom_plan&lt;/span&gt;(CachedPlanSource &lt;span style="color:#f92672"&gt;*&lt;/span&gt;plansource, ParamListInfo boundParams)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;double&lt;/span&gt;		avg_custom_cost;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Let settings force the decision */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plan_cache_mode &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PLAN_CACHE_MODE_FORCE_GENERIC_PLAN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plan_cache_mode &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PLAN_CACHE_MODE_FORCE_CUSTOM_PLAN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* See if caller wants to force the decision */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cursor_options &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; CURSOR_OPT_GENERIC_PLAN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cursor_options &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; CURSOR_OPT_CUSTOM_PLAN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Generate custom plans until we have done at least 5 (arbitrary) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;num_custom_plans &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	avg_custom_cost &lt;span style="color:#f92672"&gt;=&lt;/span&gt; plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;total_custom_cost &lt;span style="color:#f92672"&gt;/&lt;/span&gt; plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;num_custom_plans;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Prefer generic plan if it&amp;#39;s less expensive than the average custom
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * plan. (Because we include a charge for cost of planning in the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * custom-plan costs, this means the generic plan only has to be less
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * expensive than the execution cost plus replan cost of the custom
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * plans.)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Note that if generic_cost is -1 (indicating we&amp;#39;ve not yet determined
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * the generic plan cost), we&amp;#39;ll always prefer generic at this point.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;generic_cost &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; avg_custom_cost)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/128885660" target="_blank" rel="noreferrer"&gt;Hehuyi_In 软硬解析的概念&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;34. What Are VM / FSM / INIT Files
 &lt;div id="34-what-are-vm--fsm--init-files" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#34-what-are-vm--fsm--init-files" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d0c6c3c47a5b.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Numeric suffix:&lt;/strong&gt; Files fork when exceeding 1GB (default); changeable at build time via &lt;code&gt;./configure --with-segsize&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VM:&lt;/strong&gt; Visibility map, containing all-visible and all-frozen info. Helps: 1) accelerate vacuum scanning (skip all-visible pages) 2) accelerate eager freeze (skip all-frozen pages) 3) support INDEX ONLY SCAN (all-visible pages don&amp;rsquo;t need page access for tuple visibility checks)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;FSM:&lt;/strong&gt; Free space map, helping PG locate free space on pages. For index pages, since indexes are ordered, recording per-page free space is less meaningful; index FSM files only contain fully empty index pages.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;INIT:&lt;/strong&gt; A fork file only for unlogged tables, size 0, marking the data file as unlogged.&lt;/p&gt;
&lt;p&gt;《postgresql-internals-14》&lt;/p&gt;

&lt;h3 class="relative group"&gt;35. Memory Reclaim Mechanisms: kswapd / Direct Memory Reclaim / pdflush
 &lt;div id="35-memory-reclaim-mechanisms-kswapd--direct-memory-reclaim--pdflush" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#35-memory-reclaim-mechanisms-kswapd--direct-memory-reclaim--pdflush" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Memory reclaim mechanisms:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Background memory reclaim (kswapd): When physical memory is tight, the kswapd kernel thread is woken to reclaim memory asynchronously, not blocking process execution.
Direct memory reclaim: If background async reclaim can&amp;rsquo;t keep up with process memory allocation requests, direct reclaim begins — synchronous, blocking process execution.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/664b2fe2f965.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://vivani.net/2022/06/14/linux-kernel-tuning-page-allocation-failure/)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pages_low:&lt;/strong&gt; When available free pages drop below pages_low, buddy allocator wakes &lt;strong&gt;kswapd&lt;/strong&gt;; kernel begins swapping pages to disk.
&lt;strong&gt;pages_min:&lt;/strong&gt; When available pages reach pages_min, page reclaim pressure is high because the memory zone urgently needs free pages. Allocator performs kswapd work synchronously — sometimes called direct reclaim.
&lt;strong&gt;pages_high:&lt;/strong&gt; Once kswapd is woken and releasing pages, only when available pages reach pages_high does the kernel consider the zone &amp;ldquo;balanced&amp;rdquo;. At pages_high, kswapd re-enters sleep. Free pages above pages_high mean the zone state is ideal.
Memory reclaim operates per-zone; /proc/zoneinfo shows min, low, high values.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;vm.min_free_kbytes&lt;/code&gt; (the min_pages line) is a critically important OS parameter. Very low values prevent effective system memory reclamation, potentially causing crashes and service interruptions. Excessively high values increase reclaim activity, causing allocation latency and potentially immediate out-of-memory states.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pdflush and kcompactd:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;pdflush: pagecache dirty pages must be written to disk. Whether via sync (fsync etc.), OS-scheduled flushing, or database commits, ultimately the Linux kernel thread pdflush handles the flushing work.&lt;/p&gt;
&lt;p&gt;kcompactd: page compaction specifically targets memory fragmentation cleanup (flushing also works since memory returns to the buddy system). Unlike pdflush flushing, memory compaction doesn&amp;rsquo;t require disk writes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Observing memory reclaim:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;sar is one of the most comprehensive Linux system performance analysis tools, reporting on multiple dimensions: file read/write, syscall usage, disk I/O, CPU efficiency, memory usage, process activity, and IPC.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9f0b4a87e536.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;sar -B&lt;/code&gt; observes kswapd and direct memory reclaim:&lt;/p&gt;
&lt;p&gt;Example: sar viewing memory page status
&lt;code&gt;sar -B 1 3&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pgpgin/s: KB read from disk/SWAP into memory per second&lt;/li&gt;
&lt;li&gt;pgpgout/s: KB written from memory to disk/SWAP per second&lt;/li&gt;
&lt;li&gt;fault/s: page faults per second (major + minor)&lt;/li&gt;
&lt;li&gt;majflt/s: major page faults per second&lt;/li&gt;
&lt;li&gt;pgfree/s: pages placed in free queue per second&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;pgscank/s: pages scanned by kswapd per second&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;pgscand/s: pages directly scanned per second&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;pgsteal/s: pages cleared from cache per second to meet memory needs&lt;/li&gt;
&lt;li&gt;%vmeff: percentage of stolen pages (pgsteal) vs total scanned (pgscank + pgscand)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example: sar viewing historical memory info
&lt;code&gt;sar -B -s &amp;quot;08:00:00&amp;quot; -e &amp;quot;10:00:00&amp;quot;&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Without -e means from start time to now&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ sar -B -s &lt;span style="color:#e6db74"&gt;&amp;#34;08:00:00&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;09:45:01 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;09:46:01 PM 414429.37 395024.08 179478.63 0.07 352922.62 12003.78 4266.52 16269.42 99.99
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;09:47:01 PM 879907.08 337948.43 157970.97 0.02 402290.21 0.00 0.00 0.00 0.00
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;09:48:01 PM 772977.43 507343.30 150255.50 0.05 466742.08 0.00 5821.28 5821.27 100.00&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Strong recommendation: &lt;a href="https://blog.csdn.net/qq_40687433/article/details/135492312" target="_blank" rel="noreferrer"&gt;linux内存浅析&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;36. Process Scheduling, D Process Hazards and Causes
 &lt;div id="36-process-scheduling-d-process-hazards-and-causes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#36-process-scheduling-d-process-hazards-and-causes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Not fully understanding what &amp;ldquo;process scheduling&amp;rdquo; specifically refers to here; I&amp;rsquo;ll answer in terms of IPC (Inter-Process Communication).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;IPC:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Since user space in virtual address space can&amp;rsquo;t be accessed by other user processes, achieving multi-process user access to the same memory data via kernel space inevitably involves context switching (as shown on the right below). Multi-process applications clearly need inter-process access, so a method enabling user processes to directly access the same physical memory emerged: shared memory (as shown on the left below).&lt;/p&gt;
&lt;p&gt;Shared memory is one IPC (Inter-Process Communication) mechanism; others include message queues and semaphores. Shared memory is one of the fastest IPC mechanisms because it doesn&amp;rsquo;t require inter-process data copying — processes access shared memory through their own address spaces.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d6a9535557f7.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://www.geeksforgeeks.org/inter-process-communication-ipc/）&lt;/p&gt;
&lt;p&gt;Shared memory has many implementations. In PG, shared_buffer defaults to mmap for shared memory (corresponds to &lt;code&gt;shared_memory_type&lt;/code&gt;); parallel queries default to POSIX (corresponds to &lt;code&gt;dynamic_shared_memory_type&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b2a8526ef63d.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://momjian.us/main/writings/pgsql/inside_shmem.pdf" target="_blank" rel="noreferrer"&gt;https://momjian.us/main/writings/pgsql/inside_shmem.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;D Process:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;D process meaning: Uninterruptible sleep state. Indicates the process is waiting for an external event to complete, such as disk I/O or network requests. Normally, D processes cannot be directly terminated.&lt;/p&gt;
&lt;p&gt;Causes of D processes: The process is waiting for an external event, typically direct memory reclaim — synchronous and blocking application disk access. At that moment, disk-access-related processes are in D state. Note: D processes are triggered at the OS or hardware level, largely unrelated to the application itself (a little). For example, a PG large query session itself won&amp;rsquo;t produce D processes and can be killed.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/135492312" target="_blank" rel="noreferrer"&gt;linux内存浅析&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/135541103" target="_blank" rel="noreferrer"&gt;PostgreSQL内存浅析&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;37. Packet Capture and Analysis of PostgreSQL Protocol
 &lt;div id="37-packet-capture-and-analysis-of-postgresql-protocol" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#37-packet-capture-and-analysis-of-postgresql-protocol" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;PG supported protocols:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Connection protocols:&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;TCP/IP:&lt;/strong&gt; PostgreSQL&amp;rsquo;s most common communication method, allowing client-server network connections and data exchange.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Unix domain socket:&lt;/strong&gt; For same-host client-server connections, faster than TCP/IP.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SSL/TLS:&lt;/strong&gt; PostgreSQL supports SSL/TLS encryption on TCP/IP connections for data transmission security. TLS is SSL&amp;rsquo;s successor; PG (seemingly) no longer supports SSL protocol itself, though related parameters remain for TLS use.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Password authentication protocols:&lt;/li&gt;
&lt;/ol&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;MD5:&lt;/strong&gt; As the earlier default password authentication protocol, MD5 (Message Digest Algorithm 5) stores and verifies user passwords server-side.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SCRAM-SHA-256:&lt;/strong&gt; A more secure authentication protocol using SHA-256 hashing and challenge-response for user authentication. PG10+ gradually replaces MD5.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Simple packet capture analysis:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;tcpdump capture command:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tcpdump tcp port &lt;span style="color:#ae81ff"&gt;5432&lt;/span&gt; -i lo -s0 -nSX -vvv&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Capture a count(*) (already connected to database via psql -h):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&amp;gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; count&lt;span style="color:#f92672"&gt;(&lt;/span&gt;*&lt;span style="color:#f92672"&gt;)&lt;/span&gt; from t1; -- just capture this
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; count 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Captured content:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;51&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;828820&lt;/span&gt; IP (tos &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0, ttl &lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;, id &lt;span style="color:#ae81ff"&gt;29027&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;offset&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, flags [DF], proto TCP (&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;), &lt;span style="color:#66d9ef"&gt;length&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;82&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;37240&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt;.postgres: Flags [P.], cksum &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x6d13 (incorrect &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x57c6), seq &lt;span style="color:#ae81ff"&gt;1091052893&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1091052923&lt;/span&gt;, ack &lt;span style="color:#ae81ff"&gt;3014367256&lt;/span&gt;, win &lt;span style="color:#ae81ff"&gt;350&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;options&lt;/span&gt; [nop,nop,TS val &lt;span style="color:#ae81ff"&gt;92480460&lt;/span&gt; ecr &lt;span style="color:#ae81ff"&gt;92427582&lt;/span&gt;], &lt;span style="color:#66d9ef"&gt;length&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000: &lt;span style="color:#ae81ff"&gt;4500&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0052&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7163&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4006&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;c74 ac12 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;a55 E..Rqc&lt;span style="color:#f92672"&gt;@&lt;/span&gt;.&lt;span style="color:#f92672"&gt;@&lt;/span&gt;.&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;t...U
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0010: ac12 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;a55 &lt;span style="color:#ae81ff"&gt;9178&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1538&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4108&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;255&lt;/span&gt;d b3ab &lt;span style="color:#ae81ff"&gt;9818&lt;/span&gt; ...U.x.&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;A.&lt;span style="color:#f92672"&gt;%&lt;/span&gt;]....
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0020: &lt;span style="color:#ae81ff"&gt;8018&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;015&lt;/span&gt;e &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;d13 &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0101&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;080&lt;/span&gt;a &lt;span style="color:#ae81ff"&gt;0583&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;cc ...&lt;span style="color:#f92672"&gt;^&lt;/span&gt;m.........&lt;span style="color:#f92672"&gt;#&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0030: &lt;span style="color:#ae81ff"&gt;0582&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;553&lt;/span&gt;e &lt;span style="color:#ae81ff"&gt;5100&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;d73 &lt;span style="color:#ae81ff"&gt;656&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6563&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7420&lt;/span&gt; ..U&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;Q....&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0040: &lt;span style="color:#ae81ff"&gt;636&lt;/span&gt;f &lt;span style="color:#ae81ff"&gt;756&lt;/span&gt;e &lt;span style="color:#ae81ff"&gt;7428&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;a29 &lt;span style="color:#ae81ff"&gt;2066&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;726&lt;/span&gt;f &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;d20 &lt;span style="color:#ae81ff"&gt;7431&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;).&lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;.t1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0050: &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;b00 ;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;51&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;830090&lt;/span&gt; IP (tos &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0, ttl &lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;, id &lt;span style="color:#ae81ff"&gt;49370&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;offset&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, flags [DF], proto TCP (&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;), &lt;span style="color:#66d9ef"&gt;length&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;115&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt;.postgres &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;37240&lt;/span&gt;: Flags [P.], cksum &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x6d34 (incorrect &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x6e5c), seq &lt;span style="color:#ae81ff"&gt;3014367256&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3014367319&lt;/span&gt;, ack &lt;span style="color:#ae81ff"&gt;1091052923&lt;/span&gt;, win &lt;span style="color:#ae81ff"&gt;342&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;options&lt;/span&gt; [nop,nop,TS val &lt;span style="color:#ae81ff"&gt;92480461&lt;/span&gt; ecr &lt;span style="color:#ae81ff"&gt;92480460&lt;/span&gt;], &lt;span style="color:#66d9ef"&gt;length&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000: &lt;span style="color:#ae81ff"&gt;4500&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0073&lt;/span&gt; c0da &lt;span style="color:#ae81ff"&gt;4000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4006&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;cdc ac12 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;a55 E..s..&lt;span style="color:#f92672"&gt;@&lt;/span&gt;.&lt;span style="color:#f92672"&gt;@&lt;/span&gt;......U
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0010: ac12 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;a55 &lt;span style="color:#ae81ff"&gt;1538&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;9178&lt;/span&gt; b3ab &lt;span style="color:#ae81ff"&gt;9818&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4108&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;257&lt;/span&gt;b ...U.&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;.x....A.&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0020: &lt;span style="color:#ae81ff"&gt;8018&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0156&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;d34 &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0101&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;080&lt;/span&gt;a &lt;span style="color:#ae81ff"&gt;0583&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;cd ...Vm4........&lt;span style="color:#f92672"&gt;#&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0030: &lt;span style="color:#ae81ff"&gt;0583&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;cc &lt;span style="color:#ae81ff"&gt;5400&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;e00 &lt;span style="color:#ae81ff"&gt;0163&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;f75 &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;e74 ..&lt;span style="color:#f92672"&gt;#&lt;/span&gt;.T......&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0040: &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1400&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;ff ffff ................
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0050: ff00 &lt;span style="color:#ae81ff"&gt;0044&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;000&lt;/span&gt;b &lt;span style="color:#ae81ff"&gt;0001&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0001&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3443&lt;/span&gt; ...D..........&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;C&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0060: &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;000&lt;/span&gt;d &lt;span style="color:#ae81ff"&gt;5345&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;c45 &lt;span style="color:#ae81ff"&gt;4354&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2031&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;005&lt;/span&gt;a &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; ....&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.Z..
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0070: &lt;span style="color:#ae81ff"&gt;0005&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; ..I
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;51&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;830098&lt;/span&gt; IP (tos &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0, ttl &lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;, id &lt;span style="color:#ae81ff"&gt;29028&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;offset&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, flags [DF], proto TCP (&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;), &lt;span style="color:#66d9ef"&gt;length&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;52&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;37240&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt;.postgres: Flags [.], cksum &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x6cf5 (incorrect &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x5cb9), seq &lt;span style="color:#ae81ff"&gt;1091052923&lt;/span&gt;, ack &lt;span style="color:#ae81ff"&gt;3014367319&lt;/span&gt;, win &lt;span style="color:#ae81ff"&gt;350&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;options&lt;/span&gt; [nop,nop,TS val &lt;span style="color:#ae81ff"&gt;92480461&lt;/span&gt; ecr &lt;span style="color:#ae81ff"&gt;92480461&lt;/span&gt;], &lt;span style="color:#66d9ef"&gt;length&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000: &lt;span style="color:#ae81ff"&gt;4500&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0034&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7164&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4006&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;c91 ac12 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;a55 E..&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;qd&lt;span style="color:#f92672"&gt;@&lt;/span&gt;.&lt;span style="color:#f92672"&gt;@&lt;/span&gt;.&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;....U
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0010: ac12 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;a55 &lt;span style="color:#ae81ff"&gt;9178&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1538&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4108&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;257&lt;/span&gt;b b3ab &lt;span style="color:#ae81ff"&gt;9857&lt;/span&gt; ...U.x.&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;A.&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;...W
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0020: &lt;span style="color:#ae81ff"&gt;8010&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;015&lt;/span&gt;e &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;cf5 &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0101&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;080&lt;/span&gt;a &lt;span style="color:#ae81ff"&gt;0583&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;cd ...&lt;span style="color:#f92672"&gt;^&lt;/span&gt;l.........&lt;span style="color:#f92672"&gt;#&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0030: &lt;span style="color:#ae81ff"&gt;0583&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;cd ..&lt;span style="color:#f92672"&gt;#&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Reading packets visually&amp;hellip; simple analysis shows this count statement only generated 3 packets, and you can even see the &lt;code&gt;select.count(*).from.t1&lt;/code&gt; statement.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Wireshark packet analysis:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Window 1:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tcpdump tcp port &lt;span style="color:#ae81ff"&gt;5432&lt;/span&gt; -i lo -s0 -nSX -vvv -w tcpdump.cap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Window 2:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;iZ2vcdugd3f2h0t7x20pqmZ &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; psql &lt;span style="color:#f92672"&gt;-&lt;/span&gt;h &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;p &lt;span style="color:#ae81ff"&gt;5432&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;d lzldb &lt;span style="color:#f92672"&gt;-&lt;/span&gt;U lzl &lt;span style="color:#75715e"&gt;-- step 1, connect
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Password &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; lzl: &lt;span style="color:#75715e"&gt;-- step 2, enter password
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1; &lt;span style="color:#75715e"&gt;-- step 3, query
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;q &lt;span style="color:#75715e"&gt;-- step 4, exit&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note 4 steps, corresponding to at least 4 packet sections:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Step 1 - connection request&lt;/li&gt;
&lt;li&gt;Step 2 - password entry&lt;/li&gt;
&lt;li&gt;Step 3 - SQL query&lt;/li&gt;
&lt;li&gt;Step 4 - disconnect&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Now analyze tcpdump.cap with &lt;a href="https://www.wireshark.org/download.html" target="_blank" rel="noreferrer"&gt;Wireshark&lt;/a&gt;.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Step 1 - Connection Request [1-10] — TCP three-way handshake [1-3]:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d79f57937f52.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;37282-&amp;gt;5432 sends SYN, seq=0&lt;/li&gt;
&lt;li&gt;5432-&amp;gt;37282 sends SYN+ACK, seq=0 ack=1&lt;/li&gt;
&lt;li&gt;37282-&amp;gt;5432 sends ACK, seq=1 ack=1&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2c2b65905d60.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://www.researchgate.net/publication/340247809_Computer_Network_Chapter_8_Transport_Layer_UDP_and_TCP）&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Step 1 - Connection Request [1-10] — PGSQL protocol startup and authentication request [4-7]:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f1fe82c3a6ce.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;After the three-way handshake, PSQL client immediately sends a PGSQL protocol startup message to PG server [4], info: &amp;gt;?, the protocol startup message.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/376c522b4dd8.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;The above &amp;gt;? packet is 37282-&amp;gt;5432. You don&amp;rsquo;t need to check source/destination in Transmission Control Protocol. PGSQL protocol shows even less info than TCP, but it has direction: &amp;gt; means 37282-&amp;gt;5432, &amp;lt; means 37282&amp;lt;-5432.&lt;/p&gt;
&lt;p&gt;Next PGSQL protocol message is authentication request [6], info: &amp;lt;R, 37282&amp;lt;-5432.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e0135443ee2d.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Step 1 - Connection Request [1-10] — Three-way FIN [8-10]. After server sends PGSQL authentication request to client, client requests TCP disconnect, 3 TCP FINs (not 4; explained below). Note: at this point psql command line is waiting for password input&amp;hellip;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7a7094b99256.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Step 2 - Password Entry [11-22] — Three-way handshake [11-13]. Because the first TCP connection ended, establishing a connection again starts from TCP&amp;hellip; so another three-way handshake:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/dd0ed1d5110b.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Step 2 - Password Entry [11-22] — Password authentication [14-22]. Authentication phase is slightly more complex. [14-16] essentially does the same as [4-7] in step 1: client requests PGSQL protocol startup, server returns authentication request. Then [18-20] performs password authentication using &lt;strong&gt;SCRAM-SHA-256&lt;/strong&gt; mechanism; password authentication actually transmits 4 packets, including [21]&amp;rsquo;s two R authentication messages. Then [21] connection established: first two R&amp;rsquo;s are authentication complete; many S&amp;rsquo;s represent Parameter status: application name, charset, timezone, etc.; K represents Backend key, returning forked backend PID; Z represents ready for query.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/969f8aba1e26.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Step 3 - SQL Query [23-25]&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d68d274be7ab.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;[23] Q clearly represents Query, client sends packet containing SQL; [24] returns results: T represents Row Description (here only column name &amp;ldquo;count&amp;rdquo;); D represents data row, the count result is 4, data is plaintext unencrypted:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/49c15b851c3e.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;C represents Command complete; Z represents ready.&lt;/p&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Step 4 - Disconnect [26-29]. [26] client actively sends session end message, PGSQL protocol (corresponds to \q); [27-29] again 3 TCP FINs.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9fb2ec1fbc1a.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Why three FINs instead of four?&lt;/p&gt;
&lt;p&gt;&amp;ldquo;No more data to send&amp;rdquo; AND &amp;ldquo;TCP delayed ACK mechanism enabled&amp;rdquo; means the second and third FINs merge, resulting in three FINs:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d0fa97105c11.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（&lt;a href="https://www.xiaolincoding.com/network/3_tcp/tcp_three_fin.html#tcp-%E5%9B%9B%E6%AC%A1%E6%8C%A5%E6%89%8B" target="_blank" rel="noreferrer"&gt;TCP 四次挥手，可以变成三次吗？&lt;/a&gt;）&lt;/p&gt;
&lt;p&gt;Since TCP delayed ACK is enabled by default, three-FIN scenarios appear more often than four-FIN in captures.&lt;/p&gt;
&lt;p&gt;OK, simple PG packet capture and analysis complete. Summary network transmission diagram for this session:



&lt;img src="https://lastdba.com/img/csdn/8c246946c86e.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Packet capture analysis notes:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;First understand the link; typically many nodes exist between application clients and database servers: network switches, request forwarding services, etc.&lt;/li&gt;
&lt;li&gt;Capture on both ends simultaneously when possible&lt;/li&gt;
&lt;li&gt;Pay attention to capture timing and set appropriate filters&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Possible packet loss points:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/dF4juaW-ttI0Zn1j0z6tag" target="_blank" rel="noreferrer"&gt;https://mp.weixin.qq.com/s/dF4juaW-ttI0Zn1j0z6tag&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Packet loss&lt;/strong&gt; involves NICs, drivers, and kernel protocol stack — each layer can lose packets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Between two VM connections, transmission failures may occur: network congestion, line errors, etc.&lt;/li&gt;
&lt;li&gt;After NIC receives packets, the ring buffer may overflow and drop packets&lt;/li&gt;
&lt;li&gt;At IP layer: routing failures, packet size exceeding MTU, etc.&lt;/li&gt;
&lt;li&gt;At transport layer: port not listening, resource usage exceeding kernel limits, etc.&lt;/li&gt;
&lt;li&gt;At socket layer: socket buffer overflow and packet loss&lt;/li&gt;
&lt;li&gt;At application layer: application exceptions causing packet loss&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.twblogs.net/a/5cbca833bd9eee0eff4612ff/?lang=zh-cn" target="_blank" rel="noreferrer"&gt;Tcpdump一次抓包记录（Postgresql通信）&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/dF4juaW-ttI0Zn1j0z6tag" target="_blank" rel="noreferrer"&gt;学徒 DBA必备技能之网络丢包分析总结&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://pigsty.cc/zh/blog/2018/01/05/pgsql%E5%8D%8F%E8%AE%AE%E5%88%86%E6%9E%90%E7%BD%91%E7%BB%9C%E6%8A%93%E5%8C%85/" target="_blank" rel="noreferrer"&gt;PgSQL协议分析:网络抓包&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.xiaolincoding.com/network/3_tcp/tcp_three_fin.html#tcp-%E5%9B%9B%E6%AC%A1%E6%8C%A5%E6%89%8B" target="_blank" rel="noreferrer"&gt;TCP 四次挥手，可以变成三次吗？&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;38. Storage: SAN / NAS / DAS
 &lt;div id="38-storage-san--nas--das" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#38-storage-san--nas--das" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c25bdda59915.png" alt="Insert image description here" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;39. Lifecycle of an IO Request
 &lt;div id="39-lifecycle-of-an-io-request" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#39-lifecycle-of-an-io-request" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4e36321eb908.png" alt="img" /&gt;&lt;/p&gt;
&lt;p&gt;（https://blog.csdn.net/Hehuyi_In/article/details/100715177?spm=1001.2014.3001.5501）&lt;/p&gt;</content:encoded></item></channel></rss>