From Sequential Scripts to Concurrent AI Pipelines in Go — Part 2

MM
Moinuddin M Masud
9 min read

Part 2 — Goroutines and WaitGroups: Fixing One Bug, Finding Two More

Part of the series: Production-Grade Concurrent AI Systems in Go

Full code for this post: github.com/madmmas/go-concurrent-ai-systems/tree/part-02Diff from Part 1: compare/part-01...part-02Run it: go run ./cmd/news-processor inside arc-1-foundations/part-02-goroutines


In Part 1, we built the system the wrong way on purpose. Ten articles, three AI tasks each, fully sequential — and it cost us close to thirty seconds. We measured it, we did the math on what that looks like at scale, and we earned the right to reach for concurrency.

So let's reach for it.

The first instinct almost every Go developer has is the same: drop a go keyword in front of the function call and watch it run in parallel. It's one of the most seductive things about the language — concurrency that looks this easy.

It's also, on its own, completely broken. Let's see why.


The Obvious Fix

Here's the loop from Part 1:

for _, article := range articles {
    result := processArticle(article)
    results = append(results, result)
}

And here's the "obvious" concurrent version:

for _, article := range articles {
    go processArticle(article)
}

One keyword. That's the whole change. Let's run it.

Starting...
Finished in 618µs

Six hundred and eighteen microseconds. For five articles that should each take one to three seconds. Something is very wrong, and the terminal isn't even telling us what — there's no article output at all. No "Processing article 1," no summarization logs, nothing. The program just... finished.


Where Did the Work Go?

Here's what actually happened. go processArticle(article) doesn't run the function and wait for it — it launches the function in a new goroutine and immediately moves on to the next line. The for loop finishes launching all five goroutines in a fraction of a millisecond, main() has nothing left to do, and the program exits.

When main() exits, the entire process terminates. Every goroutine still running gets killed mid-flight, no matter what it was doing. There's no graceful shutdown, no "let me finish this LLM call first." The runtime just stops.

Our five processArticle goroutines never got far enough to even print their first line. They were scheduled, maybe started, and then the process ended before any of them could run.

This is the most common first mistake in Go concurrency, and it's instructive precisely because the code looks so reasonable. There's no compiler error. No panic. No stack trace pointing at the problem. The output simply doesn't match expectations, and you're left wondering where your work went.


Telling Go to Wait

What we need is a way to say: launch all this work, then don't let main() exit until every goroutine reports back that it's done. Go's standard library has exactly this — sync.WaitGroup.

A WaitGroup is a counter. You increment it once for every goroutine you're about to launch, each goroutine decrements it when it finishes, and you can block until the counter hits zero.

var wg sync.WaitGroup

for _, article := range articles {
    wg.Add(1)

    go func(a model.Article) {
        defer wg.Done()
        result := p.processArticle(a)
        results = append(results, result)
    }(article)
}

wg.Wait()

Three additions, each doing a specific job:

wg.Add(1) happens before the goroutine launches — we tell the WaitGroup to expect one more completion.

defer wg.Done() is the first line inside the goroutine. The defer matters here: it guarantees Done() gets called even if processArticle panics partway through. Without defer, a single panicking goroutine would leave the WaitGroup waiting forever for a completion that's never coming.

wg.Wait() blocks main() until every goroutine has called Done(). Now the process can't exit early — it physically cannot reach the code after wg.Wait() until the counter returns to zero.

Run it again:

Starting concurrent pipeline
Processing article 5...
  [5] Summarization started (768ms)
Processing article 1...
  [1] Summarization started (1.031s)
Processing article 2...
  [2] Summarization started (1.26s)
Processing article 3...
  [3] Summarization started (1.382s)
Processing article 4...
  [4] Summarization started (635ms)
  [4] Summarization completed
  [4] Sentiment Analysis started (617ms)
  [5] Summarization completed
  [5] Sentiment Analysis started (582ms)
  ...
Processed 5 articles in 3.763s

Now look at what's happening. Article 5 starts before article 1. Article 4 finishes its summarization before article 1 even gets there. Nothing is in order anymore — and that's exactly right. Five articles are genuinely running at the same time, each progressing at its own pace based on its own simulated latency.

For ten articles, this version finishes in roughly 3.6 to 4.2 seconds. Compare that to Part 1's ~25–30 seconds for the same ten articles. We didn't make any single AI call faster. We just stopped waiting for them one at a time.


A Bug That Doesn't Announce Itself

There's a second goroutine mistake that's even sneakier than the WaitGroup issue, because the code runs, finishes, and produces output that looks completely plausible.

for _, article := range articles {
    go func() {
        processArticle(article) // looks fine — isn't
    }()
}

Spot the difference from the version above. This closure doesn't take article as a parameter — it reaches out and captures the loop variable directly.

In older Go (before 1.22), article is a single variable that gets reused and reassigned on every iteration of the loop. The goroutines don't each get their own copy — they all share a reference to the same variable. By the time the scheduler actually runs any of these goroutines, the loop has likely already moved on, and article is sitting at whatever its last value was. You can end up with several goroutines all processing the final article, and earlier articles silently skipped.

Go 1.22 changed the loop variable semantics specifically to fix this class of bug — each iteration now gets its own copy. If you're on 1.22+, the closure-capture version above actually works correctly. But understanding why it used to break is more valuable than the fix itself, because the same shared-reference trap shows up anywhere a closure captures a variable that changes after the closure is created — not just in for loops.

The fix, and the pattern we use throughout this series regardless of Go version, is to pass the value explicitly as a function argument:

go func(a model.Article) {
    defer wg.Done()
    processArticle(a)
}(article)

Passing article as an argument evaluates it immediately, at the moment the goroutine is launched, and gives that goroutine its own copy. There's no ambiguity about which value it sees.


We Fixed One Problem and Created Another

Run the working version with the race detector turned on:

go run -race ./cmd/news-processor
WARNING: DATA RACE
Read at 0x00c000012030 by goroutine 7:
  pipeline.(*ConcurrentProcessor).ProcessAll.func1()
      processor.go:30

Previous write at 0x00c000012030 by goroutine 8:
  pipeline.(*ConcurrentProcessor).ProcessAll.func1()
      processor.go:30

The race detector is flagging this line:

results = append(results, result)

Every goroutine in our pool is appending to the same results slice. append isn't safe to call from multiple goroutines at once — it can read the slice's length, decide whether to grow it, and write to it, and none of that is atomic. Two goroutines doing this simultaneously can step on each other: one goroutine's write gets silently overwritten by another's, or the slice's internal bookkeeping gets corrupted.

The unsettling part is that this often won't show up as an obvious crash. Sometimes you'll get all five results back correctly. Sometimes you'll get four. The behavior depends on the exact timing of the goroutine scheduler, which is different every single run. That non-determinism — passing most of the time, quietly dropping data the rest of the time — is the signature of a race condition, and it's exactly the kind of bug that slips past local testing and shows up only under production load.

We didn't introduce this race condition by accident. We're going to leave it in the code for this part, because seeing it — actually watching the detector flag it on working code — is a better teacher than being told about it in the abstract.


What Concurrency Actually Cost Us

It's worth being honest about the trade we just made. Sequential code in Part 1 had a property we got for free: predictability. One thing happened, then the next thing happened. If something went wrong, the stack trace told you exactly where, and you could trust that nothing else was running at the same time to confuse the picture.

The moment we introduced goroutines, we gave up some of that. Results come back in a different order every run. A subtle bug like the loop variable capture issue can silently corrupt data without any error message. And now we have a genuine data race sitting in our code, found only because we happened to run with -race — a flag that's easy to forget.

This is the tradeoff concurrency always asks for. You get speed. You give up some of the predictability that made sequential code easy to reason about. Neither side of that trade is free, and pretending otherwise is how production incidents happen.


What's Next

In Part 3, we fix the race condition properly using sync.Mutex — and we'll look closely at where exactly to put the lock. Lock too little and the race condition stays. Lock too much — say, around the entire processArticle call instead of just the append — and you've accidentally rebuilt sequential processing with extra steps, losing every bit of the speed we just gained.

Getting that boundary right is the actual skill here. The mutex itself is two lines of code.

See you in Part 3.


This is Part 2 of the series "Production-Grade Concurrent AI Systems in Go." Read Part 1 — Why Concurrency Matters or continue to Part 3 — Race Conditions and Mutexes.