firehose/blogex/priv/blog/engineering/2026/03-10-rebuilding-data-pipeline.md

%{
  title: "Rebuilding Our Data Pipeline with Broadway",
  author: "Jane Doe",
  tags: ~w(elixir broadway data-engineering),
  description: "How we replaced our Kafka consumer with Broadway for 10x throughput"
}
---
Last quarter we hit a wall with our homegrown Kafka consumer. Message lag was
growing, backpressure was non-existent, and our on-call engineers were losing
sleep. We decided to rebuild on [Broadway](https://github.com/dashbitco/broadway).

## Why Broadway?

Broadway gives us three things our old consumer lacked:

- **Batching** — messages are grouped before hitting the database, cutting our
  write volume by 90%
- **Backpressure** — producers only send what consumers can handle
- **Fault tolerance** — failed messages are retried automatically with
  configurable strategies

## The migration

We ran both pipelines in parallel for two weeks, comparing output row-by-row.
Once we confirmed parity, we cut over with zero downtime.

```elixir
defmodule MyApp.EventPipeline do
  use Broadway

  def start_link(_opts) do
    Broadway.start_link(__MODULE__,
      name: __MODULE__,
      producer: [
        module: {BroadwayKafka.Producer, [
          hosts: [localhost: 9092],
          group_id: "my_app_events",
          topics: ["events"]
        ]}
      ],
      processors: [default: [concurrency: 10]],
      batchers: [default: [batch_size: 100, batch_timeout: 500]]
    )
  end

  @impl true
  def handle_message(_, message, _) do
    message
    |> Broadway.Message.update_data(&Jason.decode!/1)
  end

  @impl true
  def handle_batch(_, messages, _, _) do
    rows = Enum.map(messages, & &1.data)
    MyApp.Repo.insert_all("events", rows)
    messages
  end
end
```

## Results

After the migration, our p99 processing latency dropped from 12s to 180ms and
we haven't had a single page about consumer lag since.