If it hurts, do it often
“In software, when something is painful, the way to reduce the pain is to do it more frequently, not less.” - David Farley
Imagine for a moment that you're training for a marathon. The first day you run, your muscles ache, and you feel exhausted. If you stop because of that pain and never run again, you'll never be prepared for the marathon. But if you persist, gradually increasing your distance and frequency, your body adapts, becomes stronger, and the process gets smoother.
In the context of reliability, the "pain" could be the challenges faced when deploying a new large scale system, conducting disaster recovery exercises, or making changes to an existing infrastructure. If we shy away from these challenges because they are hard or can cause disruptions, we'll never be able to make meaningful progress.
The truth is that activities that require significant effort upfront tend to pay the greatest dividends for site reliability in the long run:
- Continuous deployment can be a headache at first. But frequent deployments in small batches reduce risk and provide rapid feedback. What seems extraordinarily painful if done monthly becomes straightforward and low stress when done daily.
- Comprehensive disaster recovery testing simulates worst-case scenarios that reveal weaknesses in your systems or processes. Though disruptive and toily short-term, frequent testing gives you a better sleep at night and makes teams and systems ready when the disaster does strike.
- Refactoring big legacy codebases or infrastructure parts feels risky. But incrementally improving them in small pieces reduces that risk while accelerating gains. Little refactoring wins compound over time.
When you deploy changes or test new systems more frequently, it becomes a routine. The fear and uncertainty that once surrounded the process reduces. Over time, with repetition, the teams becomes better at managing and mitigating risks.