The Dark Side of SRE
Site Reliability Engineering has emerged as one of the hottest career paths in tech in the recent years. SREs get to tackle technical challenges on complex systems at scale, and are well-compensated for their specialized skillset.
From the outside, the life of an SRE might seem prestige and full of opportunity. But behind the curtain you can often find reality full of chronic stress, career stagnation, and occupational hazards.
By exploring the flip side of SRE, we can make more informed decisions about our engineering careers and have realistic expectations. Whether you're an aspiring or current SRE, let's discuss darker aspects of things.
The High-Stress Life of an SRE
Like firefighters constantly on call, SREs live a life of high-stakes pressure and urgency. They maintain constant readiness to quickly resolve production incidents. While tackling system outages can provide adrenaline-fueled troubleshooting challenges, the unpredictable on-call schedules and urgency of incidents can lead to chronic stress over time.
Below is a stress graph shared by an SRE engineer on r/sre.
The always-on nature of SRE work makes it difficult to ever fully decompress. Prolonged exposure to stress and lack of sleep has proven consequences, including increased risk for anxiety, depression, and cardiovascular disease. While incident response provides short-term excitement, coming down from that high can lead to emotional exhaustion.
The 24/7 on-call expectation at many companies can extend SRE work hours well beyond a typical business day. Being woken up by pager alerts in the middle of the night, handling issues early in the morning before arriving at the office, and working overtime during evenings and weekends further stretches SREs thin and strains work-life balance
SREs risk the effects of chronic stress from being constantly on call, even as their expertise allows them to repeatedly resolve critical incidents. This stress residue can accumulate silently over time before reaching unhealthy levels. And that can literally shorten your life expectancy and reduce quality of life.
Jack of All Trades, Master of None
SREs pride themselves on having an adaptable generalist skillset. But this constant context switching between technologies prevents deep expertise in any one area. SREs dip into a dizzying array of domains: from networks to databases, security to clouds, etc. While this provides operational agility, it comes at the cost of advanced proficiency. The result is a mile wide but inch deep sea of superficial knowledge.
The typical SRE workday involves continual task switching between incidents, projects, technologies, and teams. This fragments focus. Interruptions are frequent, and few tasks can be driven to completion before the next pager alert demands a context swap. Such a hectic environment makes it impossible to achieve flow states where real development of talent occurs.
Of course, modern infrastructure requires broadly skilled engineers to integrate its many interdependent parts. But SREs seeking long-term career growth may find their knowledge plateauing. At a certain point, professional advancement requires deep specialized expertise. Successfully navigating the generalist-specialist tradeoff remains an ongoing balancing act for ambitious SREs.
Operational Treadmill
SREs spend most of their time reacting to operational issues rather than focusing on strategic initiatives. Putting out fires day-to-day leaves little room for long-term projects. This constant reactive work can lead to skill stagnation and career flattening over time.
Incident response utilizes SRE expertise but the repetitive firefighting takes away from creativity and intellectual stimulation. Without opportunities to architect systems or pursue multi-quarter roadmaps, some SREs may feel unfulfilled.
Of course SREs understand that infrastructure reliability remains the top priority. But many grow weary of endless reactive cycles without opportunities to build transformative capabilities that demonstrate their full talents. A career fueled solely by adrenaline is destined to lead to burnout. For SRE leaders, providing resources for strategic initiatives separates those who energize their teams versus those who extinguish them.
The Limited Career Progression
SREs specialize in a niche skillset not easily transferred elsewhere. This operational expertise brings prestige but narrowly constrains mobility, especially with the current market. While their technical knowledge runs wide, as mentioned above it lacks the depth required to progress into specialized software engineering roles. And with little opportunity to gain experience leading complex development projects, advancement into higher level positions proves difficult. Here are some reasons why career ladder as an SRE often tops out surprisingly low:
- For SREs who want to move into engineering management, they often have to switch to a non-SRE software engineering role first in order to gain management experience. The skills gained as an SRE, while valuable from an engineering perspective, are less directly applicable to people management.
- Because the SRE discipline is still relatively new, most companies have small SRE teams. This means there are fewer intermediate and senior-level positions between entry-level SRE and the SRE manager role. Early to mid-career SREs may feel stalled if they want to be promoted but the next level role just isn't available due to team size.
- While larger tech companies that pioneered the SRE concept have more well-defined job ladders, smaller companies that adopt SRE more recently struggle with providing advancement for those employees.
- Engineers who want to continue specializing in SRE but also want career growth may find the options lacking at some organizations.
Navigating the Inconsistent SRE Role
You cannot take all of your habits of work and expect to successfully transplant them to another company unchanged.
You’d expect the skills of an SRE to directly transfer between departments/companies. But in reality, each company defines and implements the role uniquely. An expert SRE at one organization may find their knowledge irrelevant at another. This document is a great example, that illustrates the point - it prepares ex-google SREs to what they should expect in the outside world.
Some key areas where the SRE role varies between the companies:
- Scope of responsibility - At some companies, SREs are responsible for operational work only. At others, they share ownership of product code as well.
- Level of software engineering work - The ratio of software development to operations work fluctuates. At some firms SRE is more like a specialized sysadmin role while at others it is much closer to a regular software engineer.
- Team structure - SRE can either be its own independent team or paired with product engineering teams (a.k.a embedded SREs). Both models have tradeoffs.
- On-call expectations - The frequency and intensity of being on-call as an SRE ranges widely. At some companies it is relatively light and infrequent, while at others it is a heavy burden.
- Technical vs soft skills - While strong technical skills are always needed, some SRE roles emphasize soft skills like influencing product teams more than others.
For the good of the profession, the SRE community still needs to coalesce around more consistent job ladders, expectations, and competencies. Only then can top talent build their careers across organizations rather than starting from scratch at each new environment.
Conclusion
Like any profession, SRE comes with tradeoffs. Its good pay and technical complexity bring great reward but also chronic stress, uneven career progression, and unclear skill development paths. However, by understanding these issues, both aspiring and current SREs can make more informed decisions and proactively mitigate risks.
SRE will likely continue advancing infrastructure frontiers, although sometimes through immense human effort. However, by openly discussing the role's dark sides, we want to empower SREs to craft intentional careers.
Member discussion