Skip to main content
  1. Posts/

Case Study: Startup Failure and SysV Shared Memory

·3795 words·18 mins
liuzhilong62
Author
liuzhilong62
PostgreSQL DBA. Writing about database internals, production cases, and source code analysis.

Problem Symptoms
#

The database instance’s RSS memory was maxed out, OOM messages appeared in the logs, and the instance died. We won’t analyze the OOM cause here.

But startup kept failing — 4 or 5 attempts according to the logs:

2026-02-12 09:15:21 CST::@:[578272]: FATAL:  pre-existing shared memory block (key 2048, ID 1328250881) is still in use
2026-02-12 09:15:21 CST::@:[578272]: HINT:  Terminate any old server processes associated with data directory "/data".
2026-02-12 09:15:21 CST::@:[578272]: LOG:  database system is shut down
2026-02-12 09:21:03 CST::@:[658824]: FATAL:  pre-existing shared memory block (key 2048, ID 1328250881) is still in use
2026-02-12 09:21:03 CST::@:[658824]: HINT:  Terminate any old server processes associated with data directory "/data".
2026-02-12 09:21:03 CST::@:[658824]: LOG:  database system is shut down
2026-02-12 09:31:12 CST::@:[794791]: LOG:  redirecting log output to logging collector process
2026-02-12 09:31:12 CST::@:[794791]: HINT:  Future log output will appear in directory "/data/pg_log".
2026-02-12 09:31:37 CST::@:[801049]: FATAL:  lock file "postmaster.pid" already exists
2026-02-12 09:31:37 CST::@:[801049]: HINT:  Is another postmaster (PID 794791) running in data directory "/data"?
2026-02-12 09:32:34 CST::@:[814396]: FATAL:  lock file "postmaster.pid" already exists
2026-02-12 09:32:34 CST::@:[814396]: HINT:  Is another postmaster (PID 794791) running in data directory "/data"?

Startup succeeded after the DBA ran ipcrm -m xxx before starting.

Although the issue was quickly resolved, many questions remained:

  • Why isn’t this scenario more common in practice?
  • The start.log shows two different error types — what operations and logic do they correspond to?
  • Can shared memory still exist even if the postmaster is gone?
  • How do you locate and clean up this shared memory segment?
  • PG has multiple shared memory segments — which one is this?
  • Besides ipcrm -m, are there other ways to get the instance started?

Error Analysis: pre-existing shared memory block
#

Three Types of Shared Memory
#

Normally, after PG starts, there are three shared memory segments.

Using the default shared_memory_type='mmap' without huge pages as an example:

## View PG's actual shared memory usage from its virtual memory map
cat /proc/`head -1 $PGDATA/postmaster.pid`/smaps | grep -E "\-s"
2b61b0563000-2b61b0564000 rw-s 00000000 00:04 116293664                  /SYSV00001000 (deleted)
2b61b057f000-2b61b05b3000 rw-s 00000000 00:12 1501001168                 /dev/shm/PostgreSQL.1193490778
2b61bbac2000-2b61fa67a000 rw-s 00000000 00:04 1500999610                 /dev/zero (deleted)

From top to bottom, these are: the SysV shared memory used at startup, shared memory for parallel queries, and shared memory for shared_buffers.

If shared_buffers uses huge pages, or if the shared_memory_type is SysV instead of mmap, the output differs slightly.

Huge pages:

2aaaaac00000-2aba9ca00000 rw-s 00000000 00:0e 48453452                   /anon_hugepage (deleted)
2b08f2eea000-2b08f2eeb000 rw-s 00000000 00:04 50692152                   /SYSV00001000 (deleted)
2b08f2f05000-2b08f302d000 rw-s 00000000 00:12 48436142                   /dev/shm/PostgreSQL.1345689218

shared_memory_type = ‘sysv’:

2b03b3ceb000-2b03b3d1f000 rw-s 00000000 00:12 1572332304                 /dev/shm/PostgreSQL.2883611352
2b03bf0c2000-2b03fdc7a000 rw-s 00000000 00:04 143917075                  /SYSV00001000 (deleted)

Summary:

PG Shared Memory Configsmaps Segmentsshared_buffers smapssysv smaps
shared_memory_type=mmap, no huge pages3 segments/dev/zero/SYSV00001000
shared_memory_type=sysv, no huge pages2 segments/SYSV00001000/SYSV00001000
shared_memory_type=mmap, with huge pages3 segments/anon_hugepage/SYSV00001000
shared_memory_type=sysv, with huge pagesnot supportednot supported

Now the key question: when the error says pre-existing shared memory block, which shared memory segment is it talking about?

Source Code Analysis
#

Searching for the error message in the source quickly leads to the key location: src/backend/port/sysv_shmem.c

First, understand what the SysV shmem is for. From scattered README content:

We still require a SysV shmem block to
 * exist, though, because mmap'd shmem provides no way to find out how
 * many processes are attached, which we need for interlocking purposes.

 * As of PostgreSQL 9.3, we normally allocate only a very small amount of
 * System V shared memory, and only for the purposes of providing an
 * interlock to protect the data directory.  The real shared memory block
 * is allocated using mmap().  This works around the problem that many
 * systems have very low limits on the amount of System V shared memory
 * that can be allocated.  Even a limit of a few megabytes will be enough
 * to run many copies of PostgreSQL without needing to adjust system settings.
  • SysV shmem can determine whether shared memory is still attached; mmap cannot
  • This SysV shmem is used to protect the data directory; shared_buffers uses mmap (by default), not SysV
  • This SysV shmem segment is tiny (from the virtual addresses we can see it’s just 4K = 2b61b0563000-2b61b0564000)

Now look at the shm state enum:

typedef enum
{
	SHMSTATE_ANALYSIS_FAILURE,	/* unexpected failure to analyze the ID */
	SHMSTATE_ATTACHED,			/* pertinent to DataDir, has attached PIDs */
	SHMSTATE_ENOENT,			/* no segment of that ID */
	SHMSTATE_FOREIGN,			/* exists, but not pertinent to DataDir */
	SHMSTATE_UNATTACHED			/* pertinent to DataDir, no attached PIDs */
} IpcMemoryState;

The key states are ATTACHED, FOREIGN, and UNATTACHED.

The SysV shmem protects the data directory — the common scenario is ensuring the directory isn’t running two instances. Since it’s shared memory, weird scenarios could mean the segment doesn’t belong to this directory or this process (FOREIGN state). If the shared memory corresponds to the data directory but no processes are running, it should be UNATTACHED. With processes running, it’s ATTACHED.

Now look at the error thrown by PGSharedMemoryCreate:

PGShmemHeader *
PGSharedMemoryCreate(Size size,
					 PGShmemHeader **shim)
{...
    for (;;)  // infinite loop
	{..
        shmid = shmget(NextShmemSegID, sizeof(PGShmemHeader), 0);// shmget to fetch the SysV shmem and return its shmid
		if (shmid < 0)
		{
			oldhdr = NULL;
			state = SHMSTATE_FOREIGN;
		}
		else
			state = PGSharedMemoryAttach(shmid, NULL, &oldhdr);// determine this shmem segment's state

        switch (state)// take different actions based on the shared memory state
		{
            ...// only showing 2 states here: attached and unattached
			case SHMSTATE_ATTACHED: // shm is attached — throw the error (this is the fault symptom we saw)
				ereport(FATAL,
						(errcode(ERRCODE_LOCK_FILE_EXISTS),
						 errmsg("pre-existing shared memory block (key %lu, ID %lu) is still in use",
								(unsigned long) NextShmemSegID,
								(unsigned long) shmid),
						 errhint("Terminate any old server processes associated with data directory \"%s\".",
								 DataDir)));
				break;
            ...
			case SHMSTATE_UNATTACHED:// shm is unattached

				/*
				 * The segment pertains to DataDir, and every process that had
				 * used it has died or detached.  Zap it, if possible, and any
				 * associated dynamic shared memory segments, as well.  This
				 * shouldn't fail, but if it does, assume the segment belongs
				 * to someone else after all, and try the next candidate.
				 * Otherwise, try again to create the segment.  That may fail
				 * if some other process creates the same shmem key before we
				 * do, in which case we'll try the next key.
				 */
                // The segment belongs to the data directory, and no process still holds it
				if (oldhdr->dsm_control != 0)
					dsm_cleanup_using_control_segment(oldhdr->dsm_control);
				if (shmctl(shmid, IPC_RMID, NULL) < 0)
					NextShmemSegID++;   // Note: ShmemSegID increments and retries
				break;
		}
     ...
     }
 ...
 }

When shmem is ATTACHED, it throws the error. When unattached, it loops infinitely, trying to clean up the segment and incrementing ShmemSegID to request a new one.

  • The first case corresponds to this fault
  • The second case corresponds to normal crash recovery (instance can still start after a crash)

SysV shmem
#

From PG10 onwards, the postmaster.pid and SysV shmem logic was significantly reworked and has been largely stable since. This article only covers the PG10+ logic.

pidfile.h:

#define LOCK_FILE_LINE_SHMEM_KEY	7

sysv_shmem.c, InternalIpcMemoryCreate():

	{
		char		line[64];

		sprintf(line, "%9lu %9lu",
				(unsigned long) memKey, (unsigned long) shmid);
		AddToDataDirLockFile(LOCK_FILE_LINE_SHMEM_KEY, line);
	}

From the source code, shmem info is saved on line 7 of postmaster.pid, containing the shmkey and shmid.

> cat postmaster.pid
242712
/data
1772698474
8531
/tmp
0.0.0.0
     4096 143917078   # <----here
ready

What Are shmkey and shmid?
#

In PG’s source, the call path is: InternalIpcMemoryCreate():

			shmid = shmget(memKey, 0, IPC_CREAT | IPC_EXCL | IPCProtection);

PG uses shmkey/memkey as a seed key to request shared memory from the kernel, which returns a unique identifier, shmid.

shmid is highly dependent on the server or rather the server’s memory state. For PG, when quickly restarting an instance, the shmid may be the same or +1 — this depends on Linux kernel internals. After a full server reboot, it’ll be completely different.

To aid understanding: whether the server reboots or not, shmkey/memkey can remain constant (since it’s user/PG input). But across a server reboot, even with the same shmkey, the returned shmid is very unlikely to be the same value.

How PG Obtains the shmkey
#

PGSharedMemoryCreate():

	/*
	 * We use the data directory's ID info (inode and device numbers) to
	 * positively identify shmem segments associated with this data dir, and
	 * also as seeds for searching for a free shmem key.
	 */
	if (stat(DataDir, &statbuf) < 0)
		ereport(FATAL,
				(errcode_for_file_access(),
				 errmsg("could not stat data directory \"%s\": %m",
						DataDir)));
...
	/*
	 * Loop till we find a free IPC key.  Trust CreateDataDirLockFile() to
	 * ensure no more than one postmaster per data directory can enter this
	 * loop simultaneously.  (CreateDataDirLockFile() does not entirely ensure
	 * that, but prefer fixing it over coping here.)
	 */
	NextShmemSegID = statbuf.st_ino;

	for (;;)
	{
		IpcMemoryId shmid;
		PGShmemHeader *oldhdr;
		IpcMemoryState state;

		/* Try to create new segment */
		memAddress = InternalIpcMemoryCreate(NextShmemSegID, sysvsize);
		if (memAddress)
			break;				/* successful create and attach */

		/* Check shared memory and possibly remove and recreate */

		/*
		 * shmget() failure is typically EACCES, hence SHMSTATE_FOREIGN.
		 * ENOENT, a narrow possibility, implies SHMSTATE_ENOENT, but one can
		 * safely treat SHMSTATE_ENOENT like SHMSTATE_FOREIGN.
		 */
		shmid = shmget(NextShmemSegID, sizeof(PGShmemHeader), 0);

PG calls stat() on the data directory, which returns the directory’s inode. PG directly uses datadir.inode as the shmkey.

In PG, the shmem key is tightly coupled to the data directory’s inode. Under normal circumstances, shmem key = datadir inode.

Verification example:

> ls -id $PGDATA
4096 /lzlcloud/pg8574/data
> cat postmaster.pid |head -7|tail -1
     4096 143917090

We can see datadir.inode = shmkey = 4096.

PG shmkey in Cloud Environments
#

Above I said generally shmkey = datadir.inode, but in cloud environments this is typically not the case.

Our cloud environment:

>  ls -id /lzlcloud/pg8298/data
4096 /lzlcloud/pg8298/data
>  ls -id /lzlcloud/pg8388/data
4096 /lzlcloud/pg8388/data
>  ls -id /lzlcloud/pg8095/data
4096 /lzlcloud/pg8095/data
>  cat /lzlcloud/pg8298/data/postmaster.pid|head -7|tail -1
     4096 971833391
>  cat /lzlcloud/pg8388/data/postmaster.pid|head -7|tail -1
     4097  62128161
>  cat /lzlcloud/pg8095/data/postmaster.pid|head -7|tail -1
     4098 143163441

The data disk directories all have inode 4096, but the shmkeys are 4096, 4097, 4098.

Why?

The inode issue relates to the filesystem:

  • Each filesystem has independent inodes
  • The filesystem reserves some inodes — the first few are unusable. Depending on mount options, our data disk’s real inodes start at 4096

So datadir.inode = 4096 is the default behavior of our cloud environment’s disk mounts. Other environments may differ — I haven’t analyzed those deeply. But with the same filesystem and mount approach for PG data directories, inode collisions are still possible.

The shmkey issue relates to PG’s source code, PGSharedMemoryCreate():

	for (;;)
	{
        ...
        NextShmemSegID = statbuf.st_ino;
        ...
		shmid = shmget(NextShmemSegID, sizeof(PGShmemHeader), 0);
		...
		switch (state)
		{
			case SHMSTATE_FOREIGN:
				NextShmemSegID++;
				break;

The initial shmkey = datadir.inode, but since the requested shmem might be FOREIGN (used by another process), PG increments shmkey by 1 and tries again.

For example, the instance with shmkey=4097 in postmaster.pid: at startup it tried shmkey=4096, but found that shmid’s memory segment was already in use by another instance (the one with shmkey=4096). So it used shmkey+1 to request a different shmid segment.

Similarly, the instance with shmkey=4098 had to increment twice to find a free shmkey-shmid pair.

shmid Relationships
#

The SysV shmid can be found in the startup error log, line 7 of postmaster.pid, and virtual memory smaps. It can be inspected via the ipcs command and cleaned up with ipcrm.

Example — note shmid=143917078 throughout:

Startup error log:

pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2026-03-05 16:02:19 CST::@:[262388]: FATAL:  pre-existing shared memory block (key 4096, ID 143917078) is still in use

postmaster.pid line 7:

> cat postmaster.pid |head -7|tail -1
     4096 143917078

Virtual memory smaps:

cat /proc/`head -1 $PGDATA/postmaster.pid`/smaps | grep -E "\-s"
2ad2b5189000-2ad2b518a000 rw-s 00000000 00:04 143917078                  /SYSV00001000 (deleted)

Inspecting and cleaning via SysV shmid:

ipcs -m -i  143917078  # cleanup: ipcrm -m shmid

Shared memory Segment shmid=143917078
uid=6001        gid=6001        cuid=6001       cgid=6001
mode=0600       access_perms=0600
bytes=56        lpid=242712     cpid=242712     nattch=10
att_time=Thu Mar  5 16:14:51 2026
det_time=Thu Mar  5 16:14:49 2026
change_time=Thu Mar  5 16:14:34 2026

Testing
#

Reproducing the Production Issue
#

Hold a backend process alive indefinitely, then kill -9 the postmaster:

> cat postmaster.pid
      4096 143917076

> ipcs -m -i  143917076  # shmem id
Shared memory Segment shmid=143917076
uid=6001        gid=6001        cuid=6001       cgid=6001
mode=0600       access_perms=0600
bytes=56        lpid=241567     cpid=64757      nattch=23

> kill -stop 107648 # any backend

> kill -9 64757 # postmaster or another process

> ipcs -m -i  143917076
Shared memory Segment shmid=143917076
uid=6001        gid=6001        cuid=6001       cgid=6001
mode=0600       access_perms=0600
bytes=56        lpid=252283     cpid=64757      nattch=1   # nattch != 0

> pg_ctl start -D $PGDATA
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2026-03-05 16:02:19 CST::@:[262388]: FATAL:  pre-existing shared memory block (key 4096, ID 143917076) is still in use
2026-03-05 16:02:19 CST::@:[262388]: HINT:  Terminate any old server processes associated with data directory "/data".
 stopped waiting
pg_ctl: could not start server

nattch=1 — the instance cannot start.

Normal Crash Recovery (Successful Startup)
#

Essentially, kill the instance and then start it:

> cat postmaster.pid
     4096 143917077

> ipcs -m -i  143917077 # shmem id
Shared memory Segment shmid=143917077
uid=6001        gid=6001        cuid=6001       cgid=6001
mode=0600       access_perms=0600
bytes=56        lpid=154800     cpid=134329     nattch=18

> kill -9 134329 # postmaster or another process

> cat postmaster.pid
     4096 143917077

> ipcs -m -i  143917077 # shmem id unchanged, segment still exists
Shared memory Segment shmid=143917077
uid=6001        gid=6001        cuid=6001       cgid=6001
mode=0600       access_perms=0600
bytes=56        lpid=169360     cpid=134329     nattch=0 # nattch=0

> ipcs -m -i  143917077 # shmem id unchanged, segment still exists

> pg_ctl start -D $PGDATA  # startup succeeds
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2026-03-05 16:14:34 CST::@:[242712]: LOG:  redirecting log output to logging collector process
2026-03-05 16:14:34 CST::@:[242712]: HINT:  Future log output will appear in directory "/data/pg_log".
 done
server started

> ipcs -m -i  143917077 # residual shmem cleaned up during startup
ipcs: id 143917077 not found
> ipcs -m -i  143917078 # shmid incremented by 1 at startup
Shared memory Segment shmid=143917078
uid=6001        gid=6001        cuid=6001       cgid=6001
mode=0600       access_perms=0600
bytes=56        lpid=273571     cpid=242712     nattch=26

> cat postmaster.pid # shmkey unchanged, shmid +1
     4096 143917078

A normal kill -9 followed by startup works fine — the residual shmem is cleaned up during startup. shmkey stays the same because inode=4096 and shmkey=4096 wasn’t occupied. shmid+1 is Linux kernel behavior, at least indicating a different shared memory segment was used.

Holding a File Descriptor But Not shmem
#

Since startup is tied to the data directory inode, and inode is tied to shmem id, startup essentially checks whether the shmem is held by another process, not whether a file descriptor is still open. So let’s test with the logger process, which holds file descriptors but not shared memory:

$ cat /proc/77300/smaps | grep -E "\-s"  # logger process — verify it has no shared memory
$ kill -stop 77300  # stop logger
$ kill -9 77076  # kill -9 pm
$ cat postmaster.pid   # file still exists
77076
/lzlcloud/pg8531/data
1772700343
8531
/tmp
0.0.0.0
     4096 143917080
ready
$ ipcs -m -i 143917080  # shared memory still exists

Shared memory Segment shmid=143917080
uid=6001        gid=6001        cuid=6001       cgid=6001
mode=0600       access_perms=0600
bytes=56        lpid=77319      cpid=77076      nattch=0
att_time=Thu Mar  5 17:27:11 2026
det_time=Thu Mar  5 17:27:15 2026
change_time=Thu Mar  5 16:45:43 2026

$ ps -ef|grep 77300  # process still alive
postgres  77300      1  0 16:45 ?        00:00:00 postgresql: lzldb: logger
postgres 135246  46622  0 17:27 pts/1    00:00:00 grep --color=auto 77300
$ pg_ctl start -D $PGDATA  # startup succeeds
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2026-03-05 17:27:55 CST::@:[140497]: LOG:  redirecting log output to logging collector process
2026-03-05 17:27:55 CST::@:[140497]: HINT:  Future log output will appear in directory "/data/pg_log".
 done
server started

The logger holds files in the data directory but is not associated with shared memory — it does not block startup.

Deleting postmaster.pid Then Failing to Start
#

Same procedure: hold a backend process, kill -9 the PM, delete postmaster.pid, attempt startup.

I’ll skip the full output — result: startup fails with:

waiting for server to start....2026-03-06 15:29:48 CST::@:[22475]: FATAL:  pre-existing shared memory block (key 4098, ID 171868173) is still in use
2026-03-06 15:29:48 CST::@:[22475]: HINT:  Terminate any old server processes associated with data directory "/data".
2026-03-06 15:29:48 CST::@:[22475]: LOG:  database system is shut down

This shows: even with a zombie process holding shmem, deleting the postmaster.pid (which contains the shmid) doesn’t stop PG from finding the corresponding shmid.

Stop a Different Instance, Start the Current One
#

PG analyzes shmid from two sources to determine if it belongs to the current instance:

  1. The shmid corresponding to datadir.inode as shmkey, or after shmkey++
  2. The shmid stored in postmaster.pid

Even if postmaster.pid is deleted, PG can still tell whether shmem is held by another process. But we can exploit datadir.inode and shmkey++ behavior to get it started.

Since in our cloud environment all data directory inodes are 4096, and shmkeys differ due to the shmkey++ source logic, we can: start or stop a PG instance whose datadir.inode = 4096 to shift the current instance’s shmkey++ by one, obtaining a different shmid.

$ kill -stop 165245
$ kill -9 164411  # stop current instance, keep one of its backend processes alive

$ pg_ctl stop -D  /pg8531/data  # stop a different instance
waiting for server to shut down.... done
server stopped
$ pg_ctl start -D /pg8574/data # try starting the current instance — fails because postmaster.pid still exists
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2026-03-05 18:22:35 CST::@:[196209]: FATAL:  pre-existing shared memory block (key 4097, ID 143917087) is still in use
2026-03-05 18:22:35 CST::@:[196209]: HINT:  Terminate any old server processes associated with data directory "/pg8574/data".
 stopped waiting
pg_ctl: could not start server
Examine the log output.

$ mv /lzlcloud/pg8574/data/postmaster.pid{,.bak}  # delete current instance's postmaster.pid
$ pg_ctl start -D /lzlcloud/pg8574/data  # try again — succeeds
2026-03-05 18:23:09 CST::@:[207725]: LOG:  redirecting log output to logging collector process
2026-03-05 18:23:09 CST::@:[207725]: HINT:  Future log output will appear in directory "/lzlcloud/pg8574/data/pg_log".
 done
server started

$ ipcs -m -i 143917087 # the shmid's SysV segment is still held by our zombie process

Shared memory Segment shmid=143917087
uid=6001        gid=6001        cuid=6001       cgid=6001
mode=0600       access_perms=0600
bytes=56        lpid=196209     cpid=164411     nattch=1
att_time=Thu Mar  5 18:22:35 2026
det_time=Thu Mar  5 18:22:35 2026
change_time=Thu Mar  5 18:21:04 2026

Startup succeeds — the current instance requested a different shared memory segment. The old segment wasn’t cleaned up. This is the “hack” of stopping another instance to start the current one in a cloud environment.

A small prerequisite: the other instance must have not only inode = current instance inode, but also shmkey < current instance shmkey.

Error Analysis: lock file "postmaster.pid" already exists
#

This problem is much simpler than the shared memory one.

During startup, PG checks the lock file and its contained PID, in CreateLockFile():

		if (other_pid != my_pid && other_pid != my_p_pid &&
			other_pid != my_gp_pid)
		{
			if (kill(other_pid, 0) == 0 ||
				(errno != ESRCH && errno != EPERM))
			{
				/* lockfile belongs to a live process */
				ereport(FATAL,
						(errcode(ERRCODE_LOCK_FILE_EXISTS),
						 errmsg("lock file \"%s\" already exists",
								filename),
						 isDDLock ?
						 (encoded_pid < 0 ?
						  errhint("Is another postgres (PID %d) running in data directory \"%s\"?",
								  (int) other_pid, refName) :
						  errhint("Is another postmaster (PID %d) running in data directory \"%s\"?",
								  (int) other_pid, refName)) :
						 (encoded_pid < 0 ?
						  errhint("Is another postgres (PID %d) using socket file \"%s\"?",
								  (int) other_pid, refName) :
						  errhint("Is another postmaster (PID %d) using socket file \"%s\"?",
								  (int) other_pid, refName))));
			}
		}

Testing is even simpler — just start it a second time while already running:

$ pg_ctl start -D /pg8531/data
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2026-03-06 15:59:05 CST::@:[89145]: FATAL:  lock file "postmaster.pid" already exists
2026-03-06 15:59:05 CST::@:[89145]: HINT:  Is another postmaster (PID 255500) running in data directory "/pg8531/data"?
 stopped waiting
pg_ctl: could not start server
Examine the log output.

So the later errors in the fault’s start.log were because the instance was already running and someone tried starting it multiple more times.

Summary
#

When starting, PG first allocates a SysV shmem segment (not the mmap-based shared_buffers) to lock the data directory. The lock is obtained by using the data directory’s inode as the shmkey passed to shmget(), which returns a unique shmid. Since the requested shmem may already be in use by another process, PG increments shmkey++ in an infinite loop until it finds an unclaimed segment. postmaster.pid line 7 stores both the shmkey and shmid. In cloud environments, you’ll often see adjacent PG instances with incrementing shmkeys — this happens because the data disks are mounted identically and share the same starting inode, causing shmkey++ to kick in.

If a PG instance is killed unexpectedly, the shmem is not automatically cleaned up. Under normal conditions, no zombie process holds the shared memory, so startup cleans it up and proceeds normally. Under abnormal conditions, a zombie process still holds the shared memory — startup fails and manual intervention is required.

Recommended handling:

  1. ipcrm -m (most recommended)
  2. Use lsof to find the zombie process and kill it
  3. Reboot the host

Not recommended but possible workarounds:

  1. mv postmaster.pid + stop a different PG instance (where the other instance’s shmkey < current instance’s shmkey)
  2. mv postmaster.pid + remount the data disk to change its inode

Finally, answering the opening questions:

  • Why isn’t this scenario more common in practice?

Abnormal instance crash + zombie processes still alive. Many crash scenarios leave no zombie processes, so startup just works.

  • The start.log shows two different error types — what do they correspond to?

The “shared memory in use” error means abnormal crash + zombie processes still exist. The “postmaster.pid already exists” error means the instance was started multiple times.

  • Can shared memory still exist if the postmaster is gone?

Yes, shared memory can persist when the postmaster is gone — PG processes don’t always cleanly exit or get cleaned up by the OS. However, if all processes are gone, the shared memory should not exist.

  • How do you locate and clean up this shared memory segment?

The shmid can be found in the startup error log (start.log). Clean it with ipcrm -m $shmid.

  • PG has multiple shared memory segments — which one is this?

The SysV shmem used to protect the data directory. It always exists. See the “Three Types of Shared Memory” section. It’s distinct from the mmap-based shared_buffers.

  • Can you find the corresponding shmem via inode or file?

Linux does not provide a userspace interface to find SysV shmem by inode or file (this statement is 100% AI-generated, cross-validated across multiple models). PG uses the data directory’s inode as a seed shmkey to request shared memory — it does not directly find shmem by inode. PG has its own mechanism for locating SysV shmem, but it’s not an absolute mapping; shmkey++ is a compromise startup logic for this reason.

Related

PostgreSQL Operations Experience 2025

·5942 words·28 mins
This is a technical operations summary, focused on being accessible and practical. It also serves as a periodic reflection on PostgreSQL database operations. Hope it helps fellow PGers. Previous ops experience: PostgreSQL Operations Experience 2024. Note: this article does not repeat content from that one. CPU # SQL performance problems are the most common root cause in PostgreSQL incident handling. This includes poor SQL performance, suboptimal indexing, sudden high concurrency, and execution plan regressions. For a database like PostgreSQL that lacks a robust plan-binding mechanism, having a DBA team to help design data models, access patterns, indexes, and tune execution plans is crucial — it can significantly reduce sudden CPU saturation incidents.

Case: Partition Data UPDATE Failure on 2026-01-01

·1367 words·7 mins
Symptoms # On December 30, business errors were reported — data could not be updated: ERROR: 55000: cannot update table "tablzl_202601" because it does not have a replica identity and publishes updates HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE. LOCATION: CheckCmdReplicaIdentity, execReplication.c:575 Temporary Recovery # The error message was clear: no replica identity. The table was a partitioned table and a 2026 partition, so I immediately suspected the new partition lacked a primary key. (A new table’s replica identity defaults to default, which only uses a primary key as the replica identity. Without a primary key, updates are impossible.)

Case Study: Row Locks and LWLock LockManager

·2063 words·10 mins
Symptoms # The database showed a large number of row locks and a smaller number of LWLock LockManager waits. CPU was maxed out and active sessions spiked. The blocking PID associated with the locks kept changing, with no obvious long-transaction blocker. (Imagine high CPU and active sessions.) The SQL corresponding to the large number of locks was as follows: UPDATE lzl_record SET rc_lzl1= rc_lzl1 + $1, pc_lzl2 = pc_lzl2 + $2, rc_lzl3 = rc_lzl3 + $3 where lzl_id = $4 Analysis # No Increase in SQL Concurrency Observed # From the correlation between hits and CPU, we can analyze from the SQL hit perspective. That UPDATE SQL accounted for about 80% of activity. The SQL’s execution count had not changed, but blks hit was clearly abnormal.