2026 · 05 · 20 · simulation

Frequent partitions break Raft

Force partitions on a schedule, observe leader churn, stalled writes, and log repair under continuous stress.

Introduction

Distributed systems often work correctly under ideal conditions but fail under unreliable networks and failures. Consensus algorithms like Raft are designed to tolerate node crashes and network partitions, but real behavior depends heavily on implementation details.

This simulation shows frequent and controlled network partitions in a Raft cluster to expose edge cases and weaknesses.

Goal

To observe and analyze:

Leader churn
Stalled or failed client writes
Log divergence and subsequent repair

By stressing the system continuously, the project evaluates:

Correctness under instability
Performance degradation
Resilience of the Raft implementation

Phase 1Building the baseline cluster

In this phase, we construct a fully functional Raft cluster consisting of five independent nodes. The goal is to establish a stable baseline implementation before introducing any network failures or partitions.

We deploy five nodes, which is the minimum recommended size for safely tolerating failures and observing meaningful quorum behavior. Each node runs as an isolated process and participates equally in the consensus protocol.

Node responsibilities

Each node implements the core components of the Raft protocol:

Raft state
- currentTerm to track logical time
- log[] to store replicated commands
- commitIndex and lastApplied for state machine progress
RPC endpoints
- RequestVote for leader election
- AppendEntries for log replication and heartbeats
Persistent storage
- Critical state (term, votedFor, log) is stored on disk to survive restarts
- This can be implemented using simple file-based storage (e.g., JSON, BoltDB)

Cluster deployment with Docker Compose

To simulate a distributed environment in a reproducible and code-driven way, we use Docker Compose. This approach gives us:

Process isolation (each node behaves like a separate machine)
A built-in private network for inter-node communication
Deterministic configuration via YAML
Easy startup, teardown, and scaling

Each Raft node runs inside its own container, exposing a single port for RPC communication.

docker-compose.yml

version: "3.9"

services:
  node1:
    build: .
    container_name: node1
    ports:
      - "8001:8000"
    environment:
      - NODE_ID=1
      - PORT=8000
      - PEERS=node2:8000,node3:8000,node4:8000,node5:8000

  node2:
    build: .
    container_name: node2
    ports:
      - "8002:8000"
    environment:
      - NODE_ID=2
      - PORT=8000
      - PEERS=node1:8000,node3:8000,node4:8000,node5:8000

  node3:
    build: .
    container_name: node3
    ports:
      - "8003:8000"
    environment:
      - NODE_ID=3
      - PORT=8000
      - PEERS=node1:8000,node2:8000,node4:8000,node5:8000

  node4:
    build: .
    container_name: node4
    ports:
      - "8004:8000"
    environment:
      - NODE_ID=4
      - PORT=8000
      - PEERS=node1:8000,node2:8000,node3:8000,node5:8000

  node5:
    build: .
    container_name: node5
    ports:
      - "8005:8000"
    environment:
      - NODE_ID=5
      - PORT=8000
      - PEERS=node1:8000,node2:8000,node3:8000,node4:8000

Dockerfile

FROM golang:1.22

WORKDIR /app
COPY . .

RUN go build -o raft-node .

CMD ["./raft-node"]

Running the cluster

docker compose up --build

At this point, we should have:

5 nodes running concurrently
A functioning Raft cluster
Leader election occurring automatically
Log replication working under normal conditions

This baseline is critical—any instability here will be amplified once failures are introduced.

Phase 2Introducing the network layer

With a stable cluster in place, the next step is to take control of communication between nodes.

In a real distributed system, the network is unreliable: messages can be delayed, dropped, or blocked entirely. To simulate this accurately, we must decouple Raft logic from direct network communication.

Instead of nodes communicating directly:

Node A => Node B

We introduce an intermediary:

Node A => Network Layer => Node B

This network layer becomes the single control point for all communication and allows us to inject failures in a controlled and observable way.

Responsibilities of the network layer

Routing messages between nodes
Dropping messages (simulate packet loss)
Delaying messages (simulate latency)
Blocking communication (simulate partitions)

Implementation approach

The most practical approach for this setup is an application-level transport layer inside our Go code.

Define a transport interface:

type Transport interface {
    Send(to string, msg Message) error
}

All Raft RPCs (RequestVote, AppendEntries) must go through this interface instead of making direct HTTP/gRPC calls.

Network controller

Implement a central controller that maintains the current network state:

type Network struct {
    mu        sync.RWMutex
    blocked   map[string]map[string]bool // from -> to
    latency   time.Duration
    dropRate  float64
}

Before sending any message:

Check if the link is blocked => drop
Apply random drop probability => maybe drop
Apply artificial delay => sleep
Forward the message if allowed

Experiments we will conduct

01Leader isolation test

Idea

Force the current leader into a partition where it cannot communicate with a majority of nodes.

Setup

Identify leader (from logs or client metadata)
Isolate it: leader alone; remaining 4 nodes form majority group

What should happen

Majority elects a new leader
Old leader remains “alive” but cannot commit entries
When healed, old leader must step down or reconcile term

What we are testing

Correctness of leader step-down logic
Safety of split-brain prevention
Term updates under isolation

Failure signals

Two leaders active at same time
Old leader still accepting commits incorrectly

02Minority partition write block test

Idea

Put the leader in the minority partition and try writing to it.

Setup

Leader isolated with 1 follower
Remaining 3 nodes form majority

What should happen

Leader cannot commit entries
Writes should fail or remain uncommitted

What we are testing

Quorum enforcement
Write safety under partition
Correct rejection of unsafe commits

Failure signals

Writes succeeding in minority partition
Clients seeing “successful” but uncommitted logs