A rescue _ -> :error is the programming equivalent of closing your eyes during a car crash. The exception happens; the context that would have told you what happened is discarded; all you know is βsomething.β When Prismatic audited the codebase under the zero-tolerance doctrine, it found 6,000+ bare rescues. Every single one was a place where a future outage would be harder to diagnose than it needed to be.
#What is wrong with a bare rescue
Three things:
- It catches everything β including
ArgumentErrorfrom a developer bug,DBConnection.ConnectionErrorfrom a real outage, andProtocol.UndefinedErrorfrom a schema mismatch. These three need very different responses. A bare rescue gives them all the same response. - It discards the stacktrace β unless you bind it and log it. Almost nobody does.
- It lies about the failure mode β the caller sees
:errorand assumes a well-known failure. It was actually a segfault in a NIF.
#The pattern that replaces it
# β Bare rescue β banned by ZERO
try do
HTTPClient.get(url)
rescue
_ -> :error
end
# β
Specific rescues + typed error + structured log
try do
HTTPClient.get(url)
rescue
e in [HTTPoison.Error, Mint.TransportError] ->
Logger.warning("http transport error",
url: url, reason: Exception.message(e))
{:error, {:transport, Exception.message(e)}}
e in [Jason.DecodeError] ->
Logger.warning("http payload decode error",
url: url, reason: Exception.message(e))
{:error, {:decode, Exception.message(e)}}
endTwo improvements, both important:
- Specific exception types. Bugs that the rescue is not meant to catch (like
FunctionClauseErrorfrom a code change) propagate to the supervisor β where they belong. - Typed error tuples. Callers get
{:error, {:transport, msg}}instead of:error. Pattern-matching on the reason is the difference between a retry loop that helps and one that makes things worse.
#Let it crash β for real this time
The Elixir slogan is βlet it crash.β A bare rescue is the opposite of that philosophy. It catches the crash, hides it, and makes the supervisor think everything is fine. Removing the rescue β so the adapter genuinely crashes and the supervisor genuinely restarts it β is usually the right move.
The rule: only rescue what you can do something about. Otherwise let the process die and the supervisor recover.
#Regression tests
Every removed bare rescue got a regression test that asserts the replacement behavior:
test "adapter returns typed transport error on network failure" do
assert {:error, {:transport, _}} = Adapter.fetch("http://127.0.0.1:1")
endWithout the regression test, the next refactor reintroduces a bare rescue because βit was simpler that way.β With the test, it fails CI.
#Where to go next
- Academy: OTP Fundamentals β supervisors and crash recovery
- Glossary: Error Handling, Zero Tolerance, ExUnit, Observability, Structured Logging
6,000 silent failures caught nothing and explained nothing. Typed errors catch the right things and explain the rest. Pick the second one.