2. The Core Problem: M Escapes Runtime Control
The moment an M enters a system call, it leaves Go runtime control. If that syscall blocks:
- The M is stuck
- The M cannot be preempted
- Any goroutines waiting on that P are starved
Go's solution is to detect this before it becomes a problem, using two distinct strategies depending on whether the syscall is async or sync.
3. The Four Blocking Scenarios
Go recognizes four distinct ways a goroutine can block, each handled differently:
| # |
Blocking Cause |
Mechanism |
M Impact |
| 1 |
Channel / mutex / atomic |
Scheduler parks G, runs next G from LRQ |
M stays free |
| 2 |
Network I/O |
netpoller (epoll/kqueue/IOCP) parks G |
M stays free |
| 3 |
File / OS syscall |
P detaches from M, finds another M |
M is stuck |
| 4 |
time.Sleep / long-running G |
sysmon detects and preempts |
M may be reclaimed |
Important distinction: Go is well-suited for network I/O-heavy workloads, but not for disk I/O-heavy ones. Here's why:
-
Network sockets implement
.poll() — they can be set to non-blocking and monitored via epoll. When data isn't ready, the goroutine parks and the M is freed.
-
File handles do NOT implement
.poll() — they are always "readable/writable" from the OS perspective, so read()/write() on files is synchronous. The M blocks until the disk responds.
Disk I/O = blocked M = reduced throughput. Under heavy disk I/O, Go runtime compensates by spawning more M threads, which can cause M count to spike dramatically.
4. Async System Calls (Network I/O)
When a goroutine makes a network system call, Go uses the netpoller to handle it asynchronously. The G separates from M+P:
Step 1: Normal execution
─────────────────────────────────────────────
P ──► M ──► G1 (running)
LRQ: [G2, G3, G4]
netpoller: idle
Step 2: G1 makes a network syscall
─────────────────────────────────────────────
G1 ──► moved to netpoller (waiting for fd)
M ──► picks up G2 from LRQ
P ──► M ──► G2 (running)
LRQ: [G3, G4]
netpoller: monitoring G1's socket fd
Step 3: Network I/O completes
─────────────────────────────────────────────
netpoller: fd ready → G1 marked Runnable
G1 ──► moved back to P's LRQ
P ──► M ──► G2 (still running)
LRQ: [G3, G4, G1]
Step 4: G1 resumes
─────────────────────────────────────────────
G1 scheduled onto M, continues execution ✅
Key outcome: No extra M is created. The M is never blocked. The netpoller runs on a dedicated system thread with its own event loop, keeping OS scheduling load minimal.
Summary: Async Path
G blocked on network I/O
↓
G detaches from M+P → moves to netpoller wait queue
↓
M continues running other Gs from LRQ
↓
epoll_wait fires → G marked Runnable → re-queued to P's LRQ
↓
G resumes on next available M ✅
5. Sync System Calls (File / OS I/O)
When a goroutine makes a blocking system call (e.g. file read, syscall.Read), the G and M are stuck together. Go responds by detaching the P:
Step 1: G1 enters a blocking syscall
─────────────────────────────────────────────
P ──► M1 ──► G1 (entering syscall → M1 will block)
LRQ: [G2, G3, G4]
Step 2: Scheduler detaches P from M1
─────────────────────────────────────────────
M1+G1 ──► stuck in syscall (isolated)
P ──► M2 (new or idle M)
M2 picks up G2 from LRQ
LRQ: [G3, G4]
Step 3: Syscall completes
─────────────────────────────────────────────
M1+G1 return from syscall
G1 ──► tries to reclaim a P
├── P available? → bind and continue
└── No P? → G1 placed in global run queue
M1 enters idle/standby state
Key outcome: The P is never idle while M1 is stuck. Other goroutines keep running on M2.
Summary: Sync Path
G blocked on OS syscall
↓
P detaches from M → binds to another M (idle or new)
↓
M+G remain in syscall until kernel returns
↓
On return: G tries to reclaim P
├── success → resume on same M
└── fail → G to global queue, M to standby
6. entersyscall and exitsyscall
Go wraps every system call with two runtime hooks:
entersyscall
Called before entering the syscall:
- Sets the P's state to
_Psyscall
-
Unbinds P from M (but M retains a pointer to P)
- Signals to the scheduler: "this M may be about to block"
exitsyscall
Called after the syscall returns:
- M still holds a pointer to its old P → tries to reclaim it first
- If the old P was taken by another M → finds any available P
- If no P is available → G is placed on the global run queue; M enters standby
entersyscall()
↓
[ kernel syscall executes ]
↓
exitsyscall()
├── old P still available? → rebind, continue ✅
├── another P available? → bind new P, continue ✅
└── no P available? → G → global queue, M → standby
Why is mcache on P, not M? Exactly because of this separation. When M+G enter a syscall and P detaches, the new M that picks up P also inherits its mcache — ensuring lock-free allocation continues uninterrupted.
7. sysmon: The Runtime Monitor Thread
Go starts a special background thread called sysmon at program startup. It runs as g0 on an M that requires no P — it always runs, regardless of scheduler state.
┌─────────────────────────────────────────────────────┐
│ sysmon responsibilities │
│ │
│ Interval: 20μs → 10ms (adaptive) │
│ │
│ ✦ Check for deadlocks (runtime.checkdead) │
│ ✦ Fire due timers │
│ ✦ Flush pending netpoll results to run queues │
│ ✦ Preempt long-running Gs (retake) │
│ ✦ Force GC if no GC for > 2 minutes │
│ ✦ Return idle spans to OS after > 5 minutes │
│ ✦ Reclaim P from Ms stuck in syscall > 10ms │
└─────────────────────────────────────────────────────┘
sysmon and Syscall Recovery
The most critical sysmon behavior for syscall handling:
sysmon detects: M has been in syscall for > 10ms
↓
sysmon calls retake()
↓
P is forcibly stripped from M
↓
P handed to another M (idle or newly created)
↓
Goroutines on P's LRQ continue running ✅
This is why you'll see M count spike under heavy disk I/O or slow syscalls — sysmon keeps creating new Ms to keep Ps busy.
8. Complete Picture: All Four Blocking Scenarios
┌──────────────────────────────────────────────────────────────────┐
│ Go Blocking Handling — Decision Tree │
│ │
│ G blocks on... │
│ │ │
│ ├── channel / mutex / atomic │
│ │ → G parked in wait queue │
│ │ → M runs next G from LRQ (M never blocked) │
│ │ │
│ ├── network I/O (socket) │
│ │ → G moved to netpoller │
│ │ → M runs next G from LRQ (M never blocked) │
│ │ → epoll fires → G re-queued → resumes │
│ │ │
│ ├── OS syscall (file, etc.) │
│ │ → entersyscall: P detaches from M │
│ │ → new M picks up P and continues │
│ │ → exitsyscall: G tries to reclaim P │
│ │ → sysmon reclaims P if M stuck > 10ms │
│ │ │
│ └── sleep / long-running G │
│ → sysmon sets preemption flag │
│ → other Gs can preempt this G │
└──────────────────────────────────────────────────────────────────┘
Summary
| Mechanism |
Purpose |
syscall package |
Self-wrapped OS calls via assembly — full runtime control |
entersyscall |
Unbind P from M before syscall; mark P as _Psyscall
|
exitsyscall |
Reclaim P after syscall; fall back to global queue if needed |
| Async syscall path |
G parks in netpoller; M stays free; no new M needed |
| Sync syscall path |
P detaches; new M services P; M+G wait for kernel |
sysmon |
Background monitor; reclaims stuck Ps, preempts long Gs, forces GC |
Go's syscall handling is a masterclass in cooperative/preemptive hybrid scheduling: the runtime does everything it can to keep Ms busy and Ps utilized, falling back to creating new threads only when truly necessary.
Next in this series: Goroutine Scheduling: User Space vs Kernel, syscall Numbers & GMP in Action (Part 5)
Follow the series for more deep dives into Go's runtime internals.