firehose/blogex/priv/blog/engineering/2026/03-10-rebuilding-data-pipeline.md
2026-03-18 12:11:28 +00:00

67 lines
1.9 KiB
Markdown

%{
title: "Rebuilding Our Data Pipeline with Broadway",
author: "Jane Doe",
tags: ~w(elixir broadway data-engineering),
description: "How we replaced our Kafka consumer with Broadway for 10x throughput"
}
---
*This is a sample blog post, generated to show what blogex can do.*
Last quarter we hit a wall with our homegrown Kafka consumer. Message lag was
growing, backpressure was non-existent, and our on-call engineers were losing
sleep. We decided to rebuild on [Broadway](https://github.com/dashbitco/broadway).
## Why Broadway?
Broadway gives us three things our old consumer lacked:
- **Batching** — messages are grouped before hitting the database, cutting our
write volume by 90%
- **Backpressure** — producers only send what consumers can handle
- **Fault tolerance** — failed messages are retried automatically with
configurable strategies
## The migration
We ran both pipelines in parallel for two weeks, comparing output row-by-row.
Once we confirmed parity, we cut over with zero downtime.
```elixir
defmodule MyApp.EventPipeline do
use Broadway
def start_link(_opts) do
Broadway.start_link(__MODULE__,
name: __MODULE__,
producer: [
module: {BroadwayKafka.Producer, [
hosts: [localhost: 9092],
group_id: "my_app_events",
topics: ["events"]
]}
],
processors: [default: [concurrency: 10]],
batchers: [default: [batch_size: 100, batch_timeout: 500]]
)
end
@impl true
def handle_message(_, message, _) do
message
|> Broadway.Message.update_data(&Jason.decode!/1)
end
@impl true
def handle_batch(_, messages, _, _) do
rows = Enum.map(messages, & &1.data)
MyApp.Repo.insert_all("events", rows)
messages
end
end
```
## Results
After the migration, our p99 processing latency dropped from 12s to 180ms and
we haven't had a single page about consumer lag since.