Prismatic · Enterprise AI Orchestration

Circuit Breakers for Flaky OSINT Sources: Fail Fast, Recover Quietly

Some OSINT sources are up 99.9% of the time. Some are up 73%. Retrying a down source synchronously is how pipelines die. Circuit breakers turn a 30-second timeout into a 1-millisecond :circuit_open.

Apr 09, 2026 · 6 min read · Tomáš Korcak (korczis)

A polite scraper hits a flaky source and waits 30 seconds for a TCP timeout. It does this 100 times an hour. That is 50 minutes of wall-clock time per hour spent waiting on one dead source. Meanwhile every other adapter is healthy and the dashboard looks fine. The fix is a circuit breaker, and in Elixir it is about 80 lines of GenServer wrapped around a few counters.

#The three states

A circuit breaker is a tiny state machine:

closed — traffic flows, failures are counted.
open — traffic is rejected immediately with {:error, :circuit_open}.
half-open — one probe request is allowed; success closes, failure re-opens.

The trick is the transition rules. Too sensitive and you flap. Too tolerant and you are back to 50 minutes per hour of timeouts.

#The rules that actually work

Open on: 5+ failures in a 30-second rolling window OR 3 consecutive failures.
Cool-down: exponential, starting at 10s, doubling up to 5 minutes.
Half-open probe: exactly one in-flight probe. Concurrent probes defeat the purpose.
Close on: probe success.

defmodule PrismaticOsint.Breaker do
  use GenServer

  def call(name, fun) do
    case GenServer.call(name, :state) do
      :open   -> {:error, :circuit_open}
      :closed -> record(name, safely(fun))
      :half_open -> probe(name, fun)
    end
  end

  defp safely(fun) do
    try do
      {:ok, fun.()}
    catch
      kind, reason -> {:error, {kind, reason}}
    end
  end
end

The try/catch wraps the user function so a crash never leaves the breaker in an inconsistent state. The :error path feeds the counter; the :ok path resets it.

#Wire it into the pipeline

Every OSINT adapter gets wrapped at the boundary:

def search(query, opts) do
  Breaker.call(breaker_for(__MODULE__), fn ->
    HTTPoison.get!(url(query), [], recv_timeout: 5_000)
  end)
end

That is it. The adapter does not know about the breaker’s internal state. The pipeline does not know about the adapter’s flakiness. Both stay boring, which is the goal.

#Telemetry is non-negotiable

Every state transition emits telemetry:

:telemetry.execute([:osint, :breaker, :open], %{count: 1}, %{adapter: name})
:telemetry.execute([:osint, :breaker, :half_open], %{count: 1}, %{adapter: name})
:telemetry.execute([:osint, :breaker, :closed], %{count: 1}, %{adapter: name})

A breaker that opens and never closes is a source that died. A breaker that flaps is a source with intermittent issues (and possibly a wrong cool-down). Without observability you cannot tell which is which.

#Where to go next

Academy: OTP Fundamentals — the GenServer under the breaker
Glossary: Fault Tolerance, Retry, Rate Limiting, Telemetry, Observability

Fail fast. Recover quietly. Let the healthy sources do the work.

Supervision Trees for OSINT Pipelines: Let It Crash, Keep the Evidence

How Prismatic structures supervision trees so that a single flaky OSINT adapter never brings down a case. DynamicSupervisor, Task.Supervisor, and the one-for-one rule that saved production.

Apr 09, 2026 · 7 min read · Tomáš Korcak (korczis)

otp supervision osint +2

Task.async_stream Patterns: The One OSINT Concurrency Primitive You Actually Need

You don't need a job queue for 50 parallel OSINT lookups. You need Task.async_stream with the right timeout, the right max_concurrency, and `on_timeout: :kill_task`. Here's the recipe and the three gotchas.

Apr 09, 2026 · 6 min read · Tomáš Korcak (korczis)

task concurrency async +2

Telemetry to Prometheus: The Pipeline You Stop Writing Once You Get It Right

`:telemetry` events → `Telemetry.Metrics` definitions → Prometheus exporter → Grafana. Four steps, three libraries, one rule: the metrics list is the API.

Apr 09, 2026 · 6 min read · Tomáš Korcak (korczis)

telemetry prometheus metrics +2

#The three states

#The rules that actually work

#Wire it into the pipeline

#Telemetry is non-negotiable

#Where to go next

Glossary

Continue reading

Supervision Trees for OSINT Pipelines: Let It Crash, Keep the Evidence

Task.async_stream Patterns: The One OSINT Concurrency Primitive You Actually Need

Telemetry to Prometheus: The Pipeline You Stop Writing Once You Get It Right