2026 · 05 · 20 · simulation

Frequent partitions break Raft

Force partitions on a schedule, observe leader churn, stalled writes, and log repair under continuous stress.

Introduction

Distributed systems often work correctly under ideal conditions but fail under unreliable networks and failures. Consensus algorithms like Raft are designed to tolerate node crashes and network partitions, but real behavior depends heavily on implementation details.

This simulation shows frequent and controlled network partitions in a Raft cluster to expose edge cases and weaknesses.

Goal

To observe and analyze:

  • Leader churn
  • Stalled or failed client writes
  • Log divergence and subsequent repair

By stressing the system continuously, the project evaluates:

  • Correctness under instability
  • Performance degradation
  • Resilience of the Raft implementation

Phase 1Building the baseline cluster

In this phase, we construct a fully functional Raft cluster consisting of five independent nodes. The goal is to establish a stable baseline implementation before introducing any network failures or partitions.

We deploy five nodes, which is the minimum recommended size for safely tolerating failures and observing meaningful quorum behavior. Each node runs as an isolated process and participates equally in the consensus protocol.

Node responsibilities

Each node implements the core components of the Raft protocol:

  • Raft state
    • currentTerm to track logical time
    • log[] to store replicated commands
    • commitIndex and lastApplied for state machine progress
  • RPC endpoints
    • RequestVote for leader election
    • AppendEntries for log replication and heartbeats
  • Persistent storage
    • Critical state (term, votedFor, log) is stored on disk to survive restarts
    • This can be implemented using simple file-based storage (e.g., JSON, BoltDB)

Cluster deployment with Docker Compose

To simulate a distributed environment in a reproducible and code-driven way, we use Docker Compose. This approach gives us:

  • Process isolation (each node behaves like a separate machine)
  • A built-in private network for inter-node communication
  • Deterministic configuration via YAML
  • Easy startup, teardown, and scaling

Each Raft node runs inside its own container, exposing a single port for RPC communication.

docker-compose.yml

version: "3.9"

services:
  node1:
    build: .
    container_name: node1
    ports:
      - "8001:8000"
    environment:
      - NODE_ID=1
      - PORT=8000
      - PEERS=node2:8000,node3:8000,node4:8000,node5:8000

  node2:
    build: .
    container_name: node2
    ports:
      - "8002:8000"
    environment:
      - NODE_ID=2
      - PORT=8000
      - PEERS=node1:8000,node3:8000,node4:8000,node5:8000

  node3:
    build: .
    container_name: node3
    ports:
      - "8003:8000"
    environment:
      - NODE_ID=3
      - PORT=8000
      - PEERS=node1:8000,node2:8000,node4:8000,node5:8000

  node4:
    build: .
    container_name: node4
    ports:
      - "8004:8000"
    environment:
      - NODE_ID=4
      - PORT=8000
      - PEERS=node1:8000,node2:8000,node3:8000,node5:8000

  node5:
    build: .
    container_name: node5
    ports:
      - "8005:8000"
    environment:
      - NODE_ID=5
      - PORT=8000
      - PEERS=node1:8000,node2:8000,node3:8000,node4:8000

Dockerfile

FROM golang:1.22

WORKDIR /app
COPY . .

RUN go build -o raft-node .

CMD ["./raft-node"]

Running the cluster

docker compose up --build

At this point, we should have:

  • 5 nodes running concurrently
  • A functioning Raft cluster
  • Leader election occurring automatically
  • Log replication working under normal conditions

This baseline is critical—any instability here will be amplified once failures are introduced.

Phase 2Introducing the network layer

With a stable cluster in place, the next step is to take control of communication between nodes.

In a real distributed system, the network is unreliable: messages can be delayed, dropped, or blocked entirely. To simulate this accurately, we must decouple Raft logic from direct network communication.

Instead of nodes communicating directly:

Node A => Node B

We introduce an intermediary:

Node A => Network Layer => Node B

This network layer becomes the single control point for all communication and allows us to inject failures in a controlled and observable way.

Responsibilities of the network layer

  • Routing messages between nodes
  • Dropping messages (simulate packet loss)
  • Delaying messages (simulate latency)
  • Blocking communication (simulate partitions)

Implementation approach

The most practical approach for this setup is an application-level transport layer inside our Go code.

Define a transport interface:

type Transport interface {
    Send(to string, msg Message) error
}

All Raft RPCs (RequestVote, AppendEntries) must go through this interface instead of making direct HTTP/gRPC calls.

Network controller

Implement a central controller that maintains the current network state:

type Network struct {
    mu        sync.RWMutex
    blocked   map[string]map[string]bool // from -> to
    latency   time.Duration
    dropRate  float64
}

Before sending any message:

  1. Check if the link is blocked => drop
  2. Apply random drop probability => maybe drop
  3. Apply artificial delay => sleep
  4. Forward the message if allowed

Experiments we will conduct

01Leader isolation test

Idea

Force the current leader into a partition where it cannot communicate with a majority of nodes.

Setup

  • Identify leader (from logs or client metadata)
  • Isolate it: leader alone; remaining 4 nodes form majority group

What should happen

  • Majority elects a new leader
  • Old leader remains “alive” but cannot commit entries
  • When healed, old leader must step down or reconcile term

What we are testing

  • Correctness of leader step-down logic
  • Safety of split-brain prevention
  • Term updates under isolation

Failure signals

  • Two leaders active at same time
  • Old leader still accepting commits incorrectly

02Minority partition write block test

Idea

Put the leader in the minority partition and try writing to it.

Setup

  • Leader isolated with 1 follower
  • Remaining 3 nodes form majority

What should happen

  • Leader cannot commit entries
  • Writes should fail or remain uncommitted

What we are testing

  • Quorum enforcement
  • Write safety under partition
  • Correct rejection of unsafe commits

Failure signals

  • Writes succeeding in minority partition
  • Clients seeing “successful” but uncommitted logs