From Sequential Scripts to Concurrent AI Pipelines in Go — Part 3

MM
Moinuddin M Masud
8 min read

Part 3 — Race Conditions and Mutexes: Locking the Right Thing

Part of the series: Production-Grade Concurrent AI Systems in Go

Full code for this post: github.com/madmmas/go-concurrent-ai-systems/tree/part-03Diff from Part 2: compare/part-02...part-03Run it: go run ./cmd/news-processor inside arc-1-foundations/part-03-race-conditions


Part 2 ended with working, fast, concurrent code — and a confession. We left a data race in it on purpose, because watching the race detector flag a bug on code that looks correct teaches more than reading about it in the abstract.

This part is about fixing that race properly. Not just making the warning go away — understanding exactly what's racing, why, and where the fix needs to live so it doesn't quietly cost us everything we gained in Part 2.


Where We Left Off

Here's the line the race detector pointed at:

results = append(results, result)

Every goroutine in our pool runs this line, all writing to the same slice. append isn't one atomic operation — it reads the slice's current length, decides whether it needs to grow the underlying array, and writes the new element. If two goroutines do this at the same moment, they can read the same length, both decide there's room, and both write to the same index. One write silently overwrites the other.

To see this clearly outside the full pipeline, here's a minimal standalone reproduction:

var results []aiResult

func processArticle(a article) {
    time.Sleep(time.Duration(rand.Intn(300)) * time.Millisecond)
    result := aiResult{ArticleID: a.ID, Summary: "AI-generated summary"}
    results = append(results, result) // the race
}

Run this with the race detector:

go run -race ./broken
WARNING: DATA RACE
Read at 0x0000005c8ae0 by goroutine 11:
  main.processArticle()
      broken/main.go:39

Previous write at 0x0000005c8ae0 by goroutine 7:
  main.processArticle()
      broken/main.go:39

That's the detector catching two goroutines touching the same memory address at the same time — one reading, one writing, neither coordinated with the other.

Worth being honest about something here: at ten or even two hundred articles on typical hardware, this code will usually still report the correct count without -race turned on. The corruption window is narrow, and Go's scheduler doesn't always land two goroutines in it at exactly the wrong instant. That's not reassuring — it's the opposite. A bug that usually doesn't show itself is far more dangerous than one that crashes reliably, because it passes local testing, passes code review, and then shows up for the first time under real production load, at 2 a.m., on a system nobody can easily reproduce. The race detector exists precisely because you cannot rely on noticing this by eye.


The Fix Is Two Lines

var mu sync.Mutex

mu.Lock()
results = append(results, result)
mu.Unlock()

A mutex — mutual exclusion lock — guarantees that only one goroutine can be inside the locked section at a time. If goroutine A holds the lock, goroutine B calling mu.Lock() blocks until A calls mu.Unlock(). There's no way for two goroutines to execute append simultaneously anymore.

Applied to our pipeline:

func (p *SafeProcessor) ProcessAll(articles []model.Article) ([]model.AIResult, time.Duration) {
    start := time.Now()

    var (
        wg      sync.WaitGroup
        mu      sync.Mutex
        results = make([]model.AIResult, 0, len(articles))
    )

    for _, article := range articles {
        wg.Add(1)
        go func(a model.Article) {
            defer wg.Done()

            result := p.processArticle(a) // AI work — outside the lock

            mu.Lock()
            results = append(results, result) // shared write — inside the lock
            mu.Unlock()
        }(article)
    }

    wg.Wait()
    return results, time.Since(start)
}

Run it with -race and the warning is gone. Run it without, and the timing looks almost identical to Part 2 — roughly 3.4 seconds for ten articles.

That last detail — the timing staying the same — is not an accident. It's the entire point of this section, and it's easy to get wrong.


Where You Lock Matters More Than the Lock Itself

Here's the mistake that's easy to make once you know mutexes fix race conditions: locking around too much.

go func(a model.Article) {
    defer wg.Done()

    mu.Lock()
    result := p.processArticle(a) // the LLM call — now INSIDE the lock
    results = append(results, result)
    mu.Unlock()
}(article)

This compiles. It's race-free — the detector will report nothing. And it is catastrophic for performance. Here's what happens when you run both versions against the same ten articles, back to back:

Good lock: processed 10 in 3.448s
Bad lock:  processed 10 in 32.825s

Locking around the entire processArticle call — which includes three separate one-to-two-second simulated LLM calls — means only one goroutine can be doing AI work at any given moment. Every other goroutine queues up at mu.Lock(), waiting its turn. You've paid the full complexity cost of goroutines, channels, and synchronization, and gotten back something slower than Part 1's plain sequential loop, because now you also have lock contention overhead on top of the sequential execution.

The rule that falls out of this: a mutex should protect the smallest possible critical section — ideally just the read-modify-write on shared memory, and nothing that does real work like a network call, a file write, or a sleep. If you find yourself holding a lock across an operation that takes more than a few microseconds, ask whether that operation actually needs to be inside the lock at all. In our case, only the slice mutation does. The LLM call touches no shared state — it can run fully in parallel.


A Second Race You Won't See Coming

While verifying this part's code, a second race condition turned up — one that has nothing to do with the results slice.

Our simulator's LLMClient holds a *rand.Rand internally, used to pick a random latency for each call:

func (c *LLMClient) Call(task string, articleID int) {
    latency := c.cfg.MinLatency + time.Duration(c.rng.Int63n(spread))
    // ...
}

math/rand's *Rand type is not safe for concurrent use on its own. In Parts 1 and 2, this never mattered — each test or run created its own simulator, so nothing shared it across goroutines. But the moment several goroutines hold a reference to the same LLMClient and call Call() concurrently — exactly what's happening in this part's ProcessAll — they're all reading and advancing the same RNG state at once:

WARNING: DATA RACE
Read at 0x00c000080000 by goroutine 7:
  math/rand.(*rngSource).Uint64()
  ...
  simulator.(*LLMClient).Call()
      simulator/llm.go:28

The fix is the same idea as the results slice, applied to the simulator itself — a mutex around just the RNG read:

func (c *LLMClient) Call(task string, articleID int) {
    c.mu.Lock()
    latency := c.cfg.MinLatency + time.Duration(c.rng.Int63n(spread))
    c.mu.Unlock()

    time.Sleep(latency) // outside the lock — no goroutine blocks another's sleep
}

This is worth sitting with for a second, because it's a slightly different lesson than the results-slice race. That race was a bug in our pipeline code. This one is a property of a dependencymath/rand's default source — that happens to be safe when used from one goroutine and unsafe the moment you share it across many. Any shared client, cache, or connection pool in a concurrent system deserves the same question: is this safe to call from multiple goroutines at once, or have I just never tested it that way?


What We Have Now

Race-free, and fast:

go test ./internal/... -race
# ok
go run ./cmd/news-processor
# Processed 10 articles in ~3.4s

Two real concurrency bugs found and fixed in this part — one in application code, one in a shared dependency — and a mutex that protects exactly as much as it needs to and not a byte more.


What's Next

The mutex works, but it leaves something uncomfortable in the design: every goroutine reaches out and touches the same shared variable, coordinated only by a lock that's easy to misplace, as we just saw with the 32-second version above. Go has a different philosophy available, summarized in one of the language's most quoted lines:

Do not communicate by sharing memory; instead, share memory by communicating.

In Part 4, we replace the mutex entirely with a channel. Workers won't touch the results slice at all — they'll send their results down a channel to a single collector that owns the slice alone. No lock, no shared write, and — as we'll measure — no loss of speed either.

See you in Part 4.


This is Part 3 of the series "Production-Grade Concurrent AI Systems in Go." Read Part 2 — Goroutines and WaitGroups or continue to Part 4 — Channels and Message Passing.