Goal: have a personal blog, and try out another point in the 'modular app design with elixir' space. Designing OTP systems with elixir had some interesting ideas.
65 lines
1.8 KiB
Markdown
65 lines
1.8 KiB
Markdown
%{
|
|
title: "Rebuilding Our Data Pipeline with Broadway",
|
|
author: "Jane Doe",
|
|
tags: ~w(elixir broadway data-engineering),
|
|
description: "How we replaced our Kafka consumer with Broadway for 10x throughput"
|
|
}
|
|
---
|
|
Last quarter we hit a wall with our homegrown Kafka consumer. Message lag was
|
|
growing, backpressure was non-existent, and our on-call engineers were losing
|
|
sleep. We decided to rebuild on [Broadway](https://github.com/dashbitco/broadway).
|
|
|
|
## Why Broadway?
|
|
|
|
Broadway gives us three things our old consumer lacked:
|
|
|
|
- **Batching** — messages are grouped before hitting the database, cutting our
|
|
write volume by 90%
|
|
- **Backpressure** — producers only send what consumers can handle
|
|
- **Fault tolerance** — failed messages are retried automatically with
|
|
configurable strategies
|
|
|
|
## The migration
|
|
|
|
We ran both pipelines in parallel for two weeks, comparing output row-by-row.
|
|
Once we confirmed parity, we cut over with zero downtime.
|
|
|
|
```elixir
|
|
defmodule MyApp.EventPipeline do
|
|
use Broadway
|
|
|
|
def start_link(_opts) do
|
|
Broadway.start_link(__MODULE__,
|
|
name: __MODULE__,
|
|
producer: [
|
|
module: {BroadwayKafka.Producer, [
|
|
hosts: [localhost: 9092],
|
|
group_id: "my_app_events",
|
|
topics: ["events"]
|
|
]}
|
|
],
|
|
processors: [default: [concurrency: 10]],
|
|
batchers: [default: [batch_size: 100, batch_timeout: 500]]
|
|
)
|
|
end
|
|
|
|
@impl true
|
|
def handle_message(_, message, _) do
|
|
message
|
|
|> Broadway.Message.update_data(&Jason.decode!/1)
|
|
end
|
|
|
|
@impl true
|
|
def handle_batch(_, messages, _, _) do
|
|
rows = Enum.map(messages, & &1.data)
|
|
MyApp.Repo.insert_all("events", rows)
|
|
messages
|
|
end
|
|
end
|
|
```
|
|
|
|
## Results
|
|
|
|
After the migration, our p99 processing latency dropped from 12s to 180ms and
|
|
we haven't had a single page about consumer lag since.
|