1 Comment
User's avatar
User's avatar
Comment removed
Dec 30
Comment removed
Arun Chearie's avatar

Good one! Thanks for asking.

There will be overhead if one isn't careful.

The key is that the process of immunity doesn’t all have to run synchronously or at full strength all the time.

In practice, one must

(1) Differentiate critical vs. Non-critical paths.

-- High-impact checks (schema, nulls on primary keys, volume anomalies) run at ingestion, while deeper semantic checks run async.

(2) Use sampling and adaptive rules.

-- One doesn’t need to inspect 100% of records forever. i.e., Once a source proves stable, you can dial checks back & spike them only when any drift is detected.

(3) Fail fast where cleanup is most expensive.

-- A few milliseconds of latency at ingest is often cheaper than hours of engineer time and broken dashboards later. Such a threshold isn’t (and shouldn't be) technical; it’s more of an organisational tolerance for downstream impact radius.

So yeah, the cure can become more expensive than cleanup if everything is treated as immune-response-level critical.

But when planned & scoped correctly, the overhead is predictable and bound; on the other hand, the cost of cleanup is spiky, unpredictable and unbound - which is why I would argue, shift left despite the latency tradeoff.

Makes sense? What do you think?