Skip to main content
  1. Posts/

A Brief Analysis of Connection Pools and TCP Probing

liuzhilong62
Author
liuzhilong62
PostgreSQL DBA. Writing about database internals, production cases, and source code analysis.

It’s important for DBAs to understand some connection pool and TCP probing/keepalive knowledge — it helps with troubleshooting disconnection errors, SQL execution errors, and HA failover scenarios.

TCP Keepalive and PostgreSQL Parameters
#

Applications (including business clients, database servers, psql) and the operating system can all set socket options. If not explicitly set, the Linux kernel parameter defaults are used.

Linux ParameterLinux DefaultSocket OptionPG Server Parameterlibpq Parameter (PG Client)
SO_KEEPALIVE (default 1)keepalives1(default),on
tcp_keepalive_time7200sTCP_KEEPIDLEtcp_keepalives_idlekeepalives_idle
tcp_keepalive_intvl75sTCP_KEEPINTVLtcp_keepalives_intervalkeepalives_interval
tcp_keepalive_probes9tcp_keepalives_countkeepalives_count
tcp_retries215
TCP_USER_TIMEOUTtcp_user_timeouttcp_user_timeout
client_connection_check_interval

Both PG server and libpq use the OS socket defaults by default.

What the defaults mean: after a connection has been idle for 2 hours, the TCP kernel actively sends a keepalive probe, and after 75s × 9 = 11.25 minutes, the connection is terminated.

The default net.ipv4.tcp_keepalive_time=7200s is far too large — it’s almost meaningless. What’s the point of doing keepalive only after the intermediate network layer (firewalls, etc.) has already killed the connection?

client_connection_check_interval is an application-layer mechanism introduced in PG 14 — the PG server performs a non-blocking recv() on the client socket every N milliseconds, and if it returns an error (connection broken), it proactively cleans up. This doesn’t require any Linux kernel parameter configuration.

TCP FIN and RST Packets
#

Reference: https://linuxvox.com/blog/what-is-the-reason-and-how-to-avoid-the-fin-ack-rst-and-rst-ack/

TCP 6 control bits:

FlagNamePurpose
SYNSynchronizeInitiates a connection (used in the handshake).
ACKAcknowledgeConfirms receipt of a packet (includes an ACK number for sequence tracking).
FINFinishSignals intent to close a connection gracefully.
RSTResetAbruptly terminates a connection (no graceful closure).
PSHPushForces immediate delivery of data (bypasses buffering).
URGUrgentMarks data as “urgent” (rarely used today).

FIN and RST can be sent in both normal and abnormal situations. Key takeaways:

  1. Process exit or program abort sends a FIN packet — this includes kill -9 (verified: killing a PG process with kill -9 sends FIN; see “Tests” section)
  2. Network unreachability such as port not listening produces an RST packet
  3. TCP keepalive timeout also produces an RST packet, because the probe detected network unreachability
  4. Firewalls may also send RESET
  5. RST packets are related to the application-layer connection reset by peer error

Below is a detailed explanation of the 6 TCP control bits and FIN/RST:

TCP Disconnection Tests
#

Test: Does killing a session trigger an active disconnect?
#

  • ORACLE: whether using the built-in alter system to kill a session or kill -9 to kill a session, the client receives a FIN packet from the server.
  • PG: using the built-in pg_terminate_backend() to kill a session, the client receives a FIN packet from the server.
  • Redis: shutting down the database or kill -9 on the redis-server process, the client receives a FIN packet from the server.

Test conclusion: Even when a process terminates abnormally, the TCP kernel can send a FIN packet.

Additionally, in this round of testing, redis-cli did not appear to handle the FIN packet correctly — it sent RST on its own:

SeqTimeDirectionFlagsNotes
117:42:43.131958S→C. ACKServer sends ACK
217:42:49.264831S→C[F.] FIN+ACKServer actively requests close
317:42:49.304905C→S. ACKClient ACKs FIN (ack=9=8+1)
4~1517:43:04 ~ 17:44:19C→S. ACKClient keeps ACKing (holding connection?)
1617:44:19.323962S→C[R] RSTServer sends RST

Test: What packet does the client receive when PG process terminates, normal shutdown, or forced shutdown?
#

Test environment: Rocky 10.1 + PG 18.2, tcpdump capturing TCP packets on the lo interface.

ScenarioServer sendsFour-way handshakeClient error
pg_terminate_backend(PID)[F.] FIN+ACK✅ CompleteFATAL: terminating connection due to administrator command
pg_ctl stop -m fast[F.] FIN+ACK✅ CompleteFATAL: terminating connection due to administrator command
kill -9 postmaster[F.] FIN+ACK✅ Completeserver closed the connection unexpectedly

Conclusion: kill -9 also sends FIN, not RST. When a process is SIGKILLed, the Linux TCP kernel closes the socket on behalf of the process, sending FIN to complete the four-way handshake. In all three scenarios, the client receives a normal FIN close — no scenario produces RST.

Test: How to produce an RST packet
#

Port not listening (PG already shut down)

14:01:48.492004 IP 127.0.0.1.52092 > 127.0.0.1.ircu-2: Flags [S], seq 2570941791
14:01:48.492012 IP 127.0.0.1.ircu-2 > 127.0.0.1.52092: Flags [R.], seq 0, ack 2570941792, win 0

Client SYN → kernel returns [R.] RST+ACK, win 0. psql reports Connection refused.

iptables REJECT –reject-with tcp-reset

14:02:37.768515 IP 127.0.0.1.36436 > 127.0.0.1.ircu-2: Flags [S], seq 382980016
14:02:37.768522 IP 127.0.0.1.ircu-2 > 127.0.0.1.36436: Flags [R.], seq 0, ack 382980017, win 0

Exactly the same as port not listening: [R.] RST+ACK. psql likewise reports Connection refused.

iptables DROP (simulating firewall silent drop)

14:00:07.050040 IP 127.0.0.1.33166 > 127.0.0.1.ircu-2: Flags [S], seq 985608804
14:00:08.095618 IP 127.0.0.1.33166 > 127.0.0.1.ircu-2: Flags [S], seq 985608804   ← retransmit after 1s
14:00:09.119647 IP 127.0.0.1.33166 > 127.0.0.1.ircu-2: Flags [S], seq 985608804   ← retransmit after 2s

No response from the server. The client retransmits SYN 3 times (at 1s, 2s, 4s intervals) then times out. Unlike REJECT, DROP produces no RST — the client can only detect it via timeout.

Summary of RST-producing scenarios

ScenarioLayerPacket TypeTriggered By
Port not listeningTCP kernel[R.] RST+ACKOS kernel
Firewall REJECTiptables[R.] RST+ACKFirewall
TCP keepalive timeoutTCP kernel[R] RSTOS kernel (after keepalive probe fails)
Process termination (kill -9)TCP kernel[F.] FIN+ACK (NOT RST!)OS kernel closes socket on behalf of process
Firewall DROPNone

Core distinction: FIN comes from process exit (kernel gracefully closes on behalf of the process, even for kill -9); RST comes from network unreachability.

Test: Does taking an IP offline trigger an active disconnect?
#

redis-cli test, taking the Redis server’s listening IP offline.

#term1:
r -h 30.181.15.96 -p 17742 -a 1qaz@WSX
sudo tcpdump host 30.181.48.7 and port 54854 -n -vv   
#term2:
sudo tcpdump host 30.181.48.7 and port 54854 -n -vv   

In this test, taking the IP offline did not produce any FIN or RST packets. Only the keepalive mechanism itself initiated an RST. The sequence:

SeqTimeDirectionFlagsNotes
117:02:43.004897Client→Server. ACKClient sends ACK (15s interval)
217:02:43.004960Server→Client. ACKServer responds ACK
317:02:58.043896Client→Server. ACKClient Keep-Alive (15s interval)
417:02:58.043953Server→Client. ACKServer responds ACK
517:02:58.063214Server→Client. ACKServer duplicate ACK
617:02:58.063234Client→Server. ACKClient responds ACK
717:03:13.051905Client→Server. ACKClient Keep-Alive (15s interval)
817:03:18.059901Client→Server. ACKClient Keep-Alive (5s interval)
917:03:23.067901Client→Server. ACKClient Keep-Alive (5s interval)
1017:03:28.075899Client→Server[R.] RST+ACKClient actively disconnects (5s interval)

redis-cli has no keepalive configuration, but the redis-cli source code hardcodes:

#define REDIS_CLI_KEEPALIVE_INTERVAL 15 /* seconds */

redis-cli’s keepalive is hardcoded at 15 seconds in the code, hence the 15-second keepalive packets visible in the capture.

During the capture, the server IP was taken offline but no disconnection notification was received. Eventually, the client’s keepalive probe detected the socket anomaly, and the client actively sent RST.

(The Redis server side can also initiate keepalive, but it wasn’t triggered this time.)

Test conclusion: Directly taking an IP offline — the kernel may not perform any FIN/RST action at all.

Test: Does normal data communication interfere with the tcp_keepalive cycle?
#

Conclusion: Yes, it does. Data communication not only sends PSH packets to the peer but also includes ACK packets.

The following test uses redis-cli, where redis-cli’s keepalive = 15s and redis-server’s keepalive = 2h:

Client TriggerTCP TimestampClient SendsServer Sends
tcp_keepalive17:16:05.558570-17:16:15.048701ACKACK
PING17:16:15.048312-17:16:15.048701PSHPSH
tcp_keepalive17:16:15.048433-17:16:30.071278ACKACK
tcp_keepalive17:16:30.070906-17:16:30.071278ACKACK

Test: Does idle_in_transaction and long-running SQL trigger keepalive?
#

Test environment: Rocky 10.1 + PG 18.2, client libpq configured with keepalives_idle=5 keepalives_interval=3.

idle_in_transaction:

16:32:11.611  Last data ACK
16:32:16.927  Client → Server [.] ACK  ← after 5.3s, first keepalive probe
16:32:16.927  Server → Client [.] ACK
16:32:21.983  Client → Server [.] ACK  ← after 5s, second probe
16:32:21.983  Server → Client [.] ACK
16:32:27.039  Client → Server [.] ACK  ← after 5s, third probe
16:32:27.039  Server → Client [.] ACK

Conclusion: idle_in_transaction does send keepalive. Every 5 seconds, a pair of probe+response — no other TCP packets whatsoever.

Long-running SQL (server tcp_keepalives_idle=10):

16:32:43.148  Last ACK (after client sends SELECT pg_sleep(30))
             ← 10 seconds of zero TCP packets ← SQL is running, but no data returned
16:32:53.279  Server → Client [.] ACK  ← after 10.1s, server sends keepalive probe
16:32:53.279  Client → Server [.] ACK

Conclusion: SQL running ≠ TCP has packets. During pg_sleep(30), there’s zero TCP communication — keepalive still fires. It only cares whether there’s data exchange at the TCP layer, not what the database is doing.

If a report query runs for 5 minutes without returning intermediate results, from the perspective of firewalls/NAT/load balancers, this TCP connection is a 5-minute dead connection — without keepalive configured, it will be killed.

Connection Probing
#

The problem of dead connections on the client side can only be solved by the client — the server is already unreachable, so you can’t expect it to notify you.

Two key concepts of connection pools:

  • socket.close() ≠ connection pool close(): The former is a TCP four-way handshake completely disconnecting; the latter is returning the connection to the pool. The connection remains ESTABLISHED, with its state changing to idle.
  • Goal of probing: To promptly detect “zombie connections” — sockets that are already broken but the connection pool still considers alive.

Two common socket error states:

  • ESTABLISHED but actually unusable: The connection pool hasn’t detected that the socket has failed; errors only appear when the application layer tries to use it.
  • TIME_WAIT: The socket is known to be unusable but not released in time; a large number of TIME_WAIT connections can exhaust ports.

Broadly speaking, probing mechanisms are divided into two types by network layer:

TypeActionTrigger MethodContent Sent
Layer 4 probingKernel-level TCP packetstcp_keepalive series parameters; connection pool’s own keepaliveACK packets (empty probe to check if peer is alive)
Layer 7 probingApplication-layer database commandstestOnBorrow / testOnReturn / testWhileIdle / PING / configure test-queryDepends on driver, e.g., SELECT 1, PING; SELECT NOT pg_is_in_recovery() / SELECT @@READ_ONLY

Layer 4 Probing
#

Linux’s tcp_keepalive is the foundation of Layer 4 probing:

net.ipv4.tcp_keepalive_time   = 7200   # Start probing after 2 hours of idle
net.ipv4.tcp_keepalive_intvl  = 75     # Probe interval 75 seconds
net.ipv4.tcp_keepalive_probes = 9      # After 9 failed probes, disconnect

The problem with the defaults: 7200 seconds (2 hours) before probing begins — by then the firewall has long since killed the connection, making the probe pointless. Production environments typically need to tune this down to the minute level.

If there’s a proxy in the path (Nginx, HAProxy, etc.), TCP keepalive only reaches the proxy, not the backend database. The proxy-to-database segment needs the proxy’s own keepalive configuration; otherwise, if the proxy dies, the connection pool won’t notice.

In actual communication, when there’s data exchange, PSH/ACK packets themselves serve as a form of “keepalive.” Keepalive only triggers when the connection is completely idle — if there’s continuous data send/receive, the keepalive timer gets reset and no ACK probe packets are sent.

Layer 7 Probing
#

Layer 7 probing is when the application actively sends database commands to verify the connection. Representative parameters for various connection pools (not exhaustive):

Connection PoolParameterDescription
JDBC GenerictestOnBorrow, testOnReturn, testWhileIdleValidate on borrow/return/idle
HikariCPconnectionTestQueryValidation SQL, commonly SELECT 1
JedistestOnBorrowValidate on borrow
LettucepingBeforeActivateConnectionPING before activation
RedissonpingConnectionIntervalPeriodic PING interval
Apache Commons Pool2testOnBorrow, etc.Generic object pool validation

Both close() and returnObject() return the connection to the pool, not truly close the TCP connection. After being returned, the connection is in idle state, but the socket remains ESTABLISHED. Apache Commons Pool2 maintains these connections through a standardized object pool management mechanism.

Regarding the performance impact of testOnBorrow: issuing SELECT 1 every time a connection is borrowed adds overhead under high concurrency. Typically, testWhileIdle + a reasonable check interval is used to balance this.

Choosing between Layer 4 and Layer 7:

  • Layer 4: Direct database connection, no proxy in the path — just tune TCP keepalive to a small value.
  • Layer 7: Proxy in the path, need to confirm the database can truly execute SQL (not just TCP reachable), and can ensure the entire path is clear.
  • Layer 7 + role awareness: When primary/replica distinction is needed, simple SQL like SELECT 1 can’t identify the database role — custom SQL must be configured. For example, Redis PING can’t tell you the replica’s status.

Single Domain vs Dual Domain
#

When the driver is configured with primary/replica addresses (JDBC’s read-write + read-only, or Lettuce’s Master/Replica), it can automatically identify primary/replica and route accordingly.

Problems with a single domain:

  • Can’t detect primary/replica switchover
  • Constrained by JVM/OS DNS caching (networkaddress.cache.ttl) — after switchover, connections may keep going to the old IP for a long time
  • Layer 7 probing with SELECT NOT pg_is_in_recovery() can detect primary/replica changes, but it’s less flexible than dual domains

Summary
#

FIN and RST occurrence scenarios:

  • FIN is sent by the kernel on behalf of the exiting process (including kill -9), completing a graceful four-way handshake close
  • RST is produced when the network is unreachable: port not listening, keepalive timeout, firewall REJECT, etc.
  • Directly taking an IP offline produces no FIN/RST — it can only be detected by keepalive probing
  • Firewall DROP silently discards packets — no RST, client can only detect via timeout

Layer 4 and Layer 7 probing mechanisms:

  • Layer 4 (TCP keepalive): Defaults to probing after 2 hours — must be tuned down for production. Only reaches the proxy, not the backend.
  • Layer 7 (application-layer PING/SQL): Can confirm the database can truly execute commands, but has performance overhead under high concurrency.
  • Proxy present / primary-replica distinction needed → Layer 7 is required
  • Direct database connection → Layer 4 tuned to a small value suffices

Keepalive behavior for idle_in_transaction and long-running SQL:

  • Both trigger keepalive — the trigger condition is no data exchange at the TCP layer, not the database state
  • SQL running ≠ TCP has packets: Long report queries that don’t return intermediate results are equivalent to dead connections at the TCP layer
  • Without keepalive configured, firewalls may kill the connection while the SQL is still running

Some notes:

  • socket.close() ≠ connection pool return: the former disconnects TCP; the latter merely marks the connection as idle
  • The goal of connection pool probing is to discover those zombie connections whose sockets are already broken but the pool still thinks are alive
  • testOnBorrow queries the database on every borrow — overhead under high concurrency; testWhileIdle + a reasonable interval is more practical
  • When there’s a proxy in the path, each segment needs independent keepalive configuration — if one segment breaks, the other side won’t notice

ref
#

Related

A DBA's Perspective on the 0526 Approved Database List

AI rate 5% TL;DR # On May 26, the Xinchuang Database List 2026 No. 2 was released, with 23 products passing (8 centralized + 15 distributed) — the most ever. Most notably: Ping An, UnionPay, China Mobile, and China Telecom — four major buyers — had their self-incubated databases debut on the list. The Xinchuang logic has changed — buyers are no longer just buyers. The Latest List # Historical batch statistics for the Xinchuang database list. Data source: China Information Security Evaluation Center (itsec.gov.cn), 8 batches total, 4 containing databases.

UUID v4 and v7: Collision Incidents and Performance Benchmarks

Source material: HN UUID v4 Collision Thread, dev.to UUID Benchmark AI-generated ratio: 99% TL;DR # UUID v4 collided — someone on HackerNews actually hit a real collision. The root cause was a software stack bug, not math. v4 and v7 have no fundamental difference in collision safety. The real difference is index performance: v7 is time-ordered, B-tree is more compact, writes are 35% faster, indexes are 22% smaller. Your UUID v4 is probably fine, but if you care about index performance, switching to v7 is a cheap win.

When PostgreSQL Becomes AI's Hands — Bruce Momjian's MCP Server in Practice

·1507 words·8 mins
Original: Building an MCP Server Using Postgres, Bruce Momjian, PGDay Armenia 2026, CC BY 4.0. AI-generated ratio: 80% Bruce Momjian (PG core team, the one who has written release notes for 20+ years) recently gave a talk at PGDay Armenia 2026: Building an MCP Server Using Postgres. 70 slides, extremely dense. Theory and practice — a solid reference. Reading it directly is hard work. Even having AI interpret it probably won’t make sense at first glance. I had to read for a while and ask several questions before it clicked.