This free Site Reliability Engineer job description template is ready to use — copy it, replace the {{placeholders}}, and post your role in minutes. It includes a company intro, a role summary, responsibilities, requirements, nice-to-haves, and compensation, with writing tips and FAQs below to help you tailor it to your team.
When to use this template
Use this when you're hiring someone to own reliability — defining SLOs, automating operations, and leading incident response so your product stays up as it scales. It's a software engineering role applied to operations, distinct from a broader DevOps role.
SRE candidates want to understand your scale, your reliability maturity, and the on-call reality. Be honest about on-call early — it's the detail that most determines fit.
If the role is more about CI/CD and developer tooling, use the DevOps Engineer template; if it's security-focused, use the Security Engineer template.
Writing tips
- State the on-call expectation plainly — rotation, frequency, and compensation.
- Describe your scale and current reliability maturity (SLOs, error budgets, etc.).
- Frame the role around reliability outcomes, not just a tool list.
- Clarify the relationship between SRE, DevOps, and the product teams.
- Include the salary range and whether the role is remote.
The job description
Copy the template below and replace the {{placeholders}} and [bracketed notes] with your specifics.
About {{company}}
{{company}} runs [scale of system]. Reliability is a feature for us, and we're hiring a Site Reliability Engineer to keep us fast and dependable as we grow.
The role
As a Site Reliability Engineer, you'll own the reliability of our systems. You'll define SLOs, automate away toil, lead incident response, and partner with engineers to build systems that stay up under load. This role reports to {{hiring_manager}} and is based {{work_type}} in {{location}}.
What you'll do
- Define and track SLOs and error budgets with the teams you support.
- Automate operations to eliminate toil and reduce manual work.
- Lead incident response and run blameless post-incident reviews.
- Improve observability — metrics, logging, tracing, and alerting.
- Partner with engineers to design for reliability and scale.
What we're looking for
- 3+ years in an SRE, DevOps, or systems engineering role.
- Strong programming skills ([Python, Go]) used to automate operations.
- Deep experience with cloud infrastructure, containers, and orchestration.
- A calm, methodical approach to incidents and a bias toward durable fixes.
- A track record of measurably improving reliability.
Nice to have
- Experience with [your observability stack, e.g. Prometheus, Grafana, Datadog].
- Background scaling systems through rapid growth.
- Familiarity with chaos engineering or capacity planning.
What we offer
- Salary range: {{salary_range}}, plus equity.
- [Comprehensive benefits].
- Flexible {{work_type}} working and [PTO policy], with fair on-call compensation.
- A mandate to make reliability a first-class part of how we build.
How to personalize
Replace these placeholders before posting:
- {{company}}
- {{location}}
- {{work_type}}
- {{salary_range}}
- {{hiring_manager}}
The bracketed notes — like [your benefits] or [your primary language(s)] — are prompts to swap in your own details. The more specific you are about the actual work and stack, the stronger your applicant pool will be.
Frequently asked questions
- What does a Site Reliability Engineer do?
- A Site Reliability Engineer keeps production systems reliable. They define service level objectives (SLOs), automate operations to remove toil, lead incident response, improve observability, and partner with engineers to build systems that stay up as they scale.
- What's the difference between SRE and DevOps?
- DevOps is a broad practice for improving the path from code to production. SRE is a specific discipline, originated at Google, that applies software engineering to operations with formal reliability targets like SLOs and error budgets. SREs tend to focus more narrowly on reliability; the roles overlap heavily.
- What skills should a Site Reliability Engineer have?
- Strong programming skills used to automate operations, deep experience with cloud infrastructure and orchestration, fluency with observability tooling, and a calm, methodical approach to incidents. A track record of measurably improving reliability is the strongest signal.