Continuous OSINT is a pipeline, not a cron job. And pipelines without backpressure explode the first time one stage slows down. Broadway β built on GenStage β is how Prismaticβs monitoring apps absorb spiky rate-limited upstreams without dropping messages or melting the database.
#The spiky reality
A realistic monitoring load:
- Certificate transparency firehose: ~1k events/sec peak, 50/sec off-peak
- Shodan polled queries: 100 req/min hard ceiling
- Czech ARES: 30 req/min polite ceiling
- Entity enrichment (Postgres + graph write): ~500 writes/sec sustained
Without backpressure, the CT firehose drowns Shodan, Shodan gets 429βd, retries storm, and the ARES ceiling gets blown. Every stage fails for a reason caused by some other stage.
#The Broadway shape
defmodule PrismaticOsint.CtPipeline do
use Broadway
def start_link(_opts) do
Broadway.start_link(__MODULE__,
name: __MODULE__,
producer: [
module: {PrismaticOsint.CtProducer, []},
concurrency: 1,
rate_limiting: [allowed_messages: 1000, interval: 1_000]
],
processors: [
default: [concurrency: 20, min_demand: 5, max_demand: 20]
],
batchers: [
enrich: [concurrency: 4, batch_size: 50, batch_timeout: 500],
index: [concurrency: 2, batch_size: 100, batch_timeout: 1_000]
]
)
end
def handle_message(_, msg, _), do: route(msg)
def handle_batch(:enrich, msgs, _, _), do: enrich_batch(msgs)
def handle_batch(:index, msgs, _, _), do: index_batch(msgs)
endFour knobs matter:
rate_limitingat the producer β hard ceiling on incoming messages.max_demandon processors β how far the pipeline pulls ahead before waiting.batch_sizeon batchers β amortize Postgres + graph writes.batch_timeoutβ cap tail latency when volume drops.
Get those four right and the pipeline pulls work at a rate the slowest stage can handle. That is the whole point of backpressure: the slow stage sets the tempo.
#Rate-limited adapters get their own pipeline
Shodan and ARES do not belong in the CT firehose pipeline. They get their own Broadway with its own producer-level rate limiting. Mixing them into a high-throughput pipeline means the high-throughput stages starve while the slow stages crawl.
#Dead letters are evidence
Every message that fails after retries ends up in a dead-letter Postgres table β with the original payload, the failing stage, and the last error. Dead letters are not βthings to fix someday.β They are the ground truth about which part of the real world your pipeline does not model. Review them weekly or they compound.
#Where to go next
- Academy: Storage Patterns β the write side of the pipeline
- Academy: OTP Fundamentals β GenStage / Broadway foundations
- Glossary: Broadway, GenStage, Backpressure, Pipeline, Rate Limiting
The slowest stage sets the tempo. Plan for it. The pipeline will thank you at 3am.