Site Reliability Engineering Job Description

Site Reliability Engineering Job Description

4.7
188 votes for Site Reliability Engineering
Site reliability engineering provides operational and engineering support for Oracle RAC, Oracle ASM, Oracle ADG, Data Encryption (Oracle TDE), Oracle GoldenGate, Oracle GoldenGate with Bigdata, Heterogeneous Replication, Partitioning, Oracle VPD, MSSQL Always-On, MySQL Replication, and Redis clustering.

Site Reliability Engineering Duties & Responsibilities

To write an effective site reliability engineering job description, begin by listing detailed duties, responsibilities and expectations. We have included site reliability engineering job description templates that you can modify and use.

Sample responsibilities for this position include:

Define and verify standards for configuration, monitoring, reliability and performance
Participate in projects and influence project directions in compliance with PDIT technology standards
Create and adjust PEO standards related to various automation and orchestration technologies
Lead and grow the team of top-talent senior SREs responsible for front-door incident management
Continually drive down time-to-detect and time-to-resolve through improved outlier detection and real time root cause analysis
Drive operational best practice adoption across critical services, continually looking to lower operational barriers to achieving improved reliability
Drive organizational awareness of the importance of availability, applying a creative lens to the many facets of availability
Participates in the management of full life cycle product development to include analysis and planning related to product development, launch and deployment
Document and detail areas of improvement to bolster architecture, design, technical requirements and service specifications
Demonstrates technical leadership and mentoring on the application of new technologies and systems management methodologies

Site Reliability Engineering Qualifications

Qualifications for a job description may include education, certification, and experience.

Licensing or Certifications for Site Reliability Engineering

List any licenses or certifications required by the position: ITIL, AWS, DNS, CCNA, RHCE, SSL, HTTP, TCP, TLS, SQL

Education for Site Reliability Engineering

Typically a job would require a certain level of education.

Employers hiring for the site reliability engineering job most commonly would prefer for their future employee to have a relevant degree such as Bachelor's and Master's Degree in Computer Science, Technical, Engineering, Mathematics, Software Engineering, Science, Systems Engineering, Management, Information Systems, Physics

Skills for Site Reliability Engineering

Desired skills for site reliability engineering include:

Architectures
Firewalls and TCP/IP networks
Networking systems and protocols
Relational database systems – MySQL
SQL Server 2005/2008/2012
Standards
Web technologies
Oracle 11 and Oracle RAC
Storage Management Solutions
Implementation and networking

Desired experience for site reliability engineering includes:

Supervise a team of SREs, ensuring that production applications your team supports are stable, reliable, and well-documented
Work closely with engineering managers, technical leads, architects and development teams to ensure that platforms are designed with scale and operability in mind
Troubleshoot and debug complex issues in production applications
Communicate effectively with people at all levels of the organization
Manage team of 8-10 high performing OpsAutomation/devOps/SRE individuals
Previous experience with ESX/vSphere, NSX, vSAN is highly desirable

Site Reliability Engineering Examples

1

Site Reliability Engineering Job Description

Job Description Example
Our innovative and growing company is searching for experienced candidates for the position of site reliability engineering. Thank you in advance for taking a look at the list of responsibilities and qualifications. We look forward to reviewing your resume.
Responsibilities for site reliability engineering
  • Ensure adherence to SLAs and quality standards
  • Help design and improve data pipelines with the goal of making them easily monitored and cost effective
  • Commit code for instrumenting new and existing data pipelines with stats and monitoring hooks
  • Install, upgrade, and maintain our production Splunk infrastructure
  • Own, scale, and improve our in-house stats infrastructure, supporting over 600,000 individual metrics per minute
  • Evaluate next-generation monitoring and metrics collection tools and utilities
  • Help guide our Data Engineering team towards SRE best practices
  • Own and operate the architecture and systems that collect data in real-time from over 120 million unique users per month
  • Select and develop automation tools and scripts to improve the availability, manageability, scalability and operability of services
  • Solve performance and stability issues and prevent their recurrence
Qualifications for site reliability engineering
  • Objectionable
  • A passion towards automating things
  • An understanding of the 12 Factor App
  • A high degree of interest in Linux containers and smart clustering solutions like Kubernetes/Mesos/fleet
  • Strong experience in at least one infrastructure component (operating systems, compute, storage, networking, distributed systems, big data, cloud, containers, ) and solid understanding of the rest and how they impact services
  • Bachelor's degree in Computer Science or equivalent qualification/experience
2

Site Reliability Engineering Job Description

Job Description Example
Our growing company is hiring for a site reliability engineering. Please review the list of responsibilities and qualifications. While this is our ideal list, we will consider candidates that do not necessarily have all of the qualifications, but have sufficient experience and talent.
Responsibilities for site reliability engineering
  • Implement comprehensive service monitoring to ensure uptime and performance, including synthetic, real user, system, application performance, dashboards
  • Define, measure, and meet key Service Level Objectives including availability, performance, incidents and chronic problems
  • Partner with application and business stakeholders to ensure high quality product is developed and released into production
  • Partner with application owners to ensure adequate performance, scalability of reliability of underlying infrastructure
  • Establish the annual release calendar in partnership with application owners and monitor adherence to the Release Management processes, policies and procedures
  • Roll up your sleeves and debug/tune/code/fix alongside your team
  • Coach and mentor junior and new college graduates
  • Evaluate, innovate, develop, and support any variety of internal PE&O automation systems geared to produce efficiency at scale
  • Able to differentiate and articulate the difference between good and bad design at numerous levels
  • Provide internal production system support
Qualifications for site reliability engineering
  • Significant experience in designing, delivering and managing data infrastructure at scale
  • A deep technical understanding of modern batch and real-time data technologies
  • A proven track record of managing large volumes of data in cloud services while controlling costs
  • Advanced knowledge of Unix/Linux systems
  • Ability to write code
  • Ability to learn rapidly and communicate value of new technologies to technical and non-technical audiences
3

Site Reliability Engineering Job Description

Job Description Example
Our company is growing rapidly and is hiring for a site reliability engineering. Thank you in advance for taking a look at the list of responsibilities and qualifications. We look forward to reviewing your resume.
Responsibilities for site reliability engineering
  • Providing standardized offerings to facilitate the successful secure access to stacks and the cloud environment overall
  • Manage engineers working with the engineering teams on our back-end services like our Hadoop, HDFS, Memcached, Reddis, Kubernetes, AWS, Java, Golang, Linux
  • Directly leading and training a team of Site Reliability Engineers focused on high availability
  • Coach and train engineers on actively diagnosing real-time production environment by analyzing code, log files, network traces and request/response pairs
  • Ensure team is working efficiently and effectively to identify root cause of failures, determine quickest path to resolution, and take actions to prevent similar issues from occurring in the future
  • Build and maintain relationships with product managers, support teams and leadership
  • Interface with front-end and back-end developers providing performance data and guidance on areas for improvement
  • Work with vendor contacts to manage business relationship and support needs
  • Participate in shared on-call support phone rotation and handle escalations
  • Define and execute on a roadmap evolving our monitoring and reliability capabilities
Qualifications for site reliability engineering
  • Meticulous and careful
  • Experience with web-based tool development (Python/Django, Java, Ruby/RoR), and building infrastructure tooling and reporting
  • Automation mindset - if you can automate it, do it
  • Have expert level skills in Linux/Windows system and network administration and agile implementation of production systems
  • 10+ years of hands-on technical experience combined with strong management and communication skills
  • Solid understanding of Windows, Linux, Networking, TCP-IP, Routing, Switching, Firewalls, Load balancers and other infrastructure components
4

Site Reliability Engineering Job Description

Job Description Example
Our company is looking for a site reliability engineering. Thank you in advance for taking a look at the list of responsibilities and qualifications. We look forward to reviewing your resume.
Responsibilities for site reliability engineering
  • Lead lifecycle management process to ensure clearly defined roadmaps for new technology solutions ensuring seamless transformation and adoption by the business
  • Define and document technical requirements for new capabilities, working with key suppliers to solution, build and lab certify ensuring compliance with all functional, operational and business objectives
  • Highest escalation point for critical and/or chronic incidents, provide subject matter leadership to Operations, helping to restore service
  • Lead/Contributor on key network projects, representing the organization and working closely with Project Managers, Operations, Business Units, Suppliers, Peer Organizations and IT stakeholders
  • Development experience with automation functions to setup, configure, and upgrade various network technologies, improving quality and reducing manual efforts
  • Lead, coach and develop engineers across 3 shifts, including remote employees
  • Manage shift leads who each have direct reports
  • Oversee 24/7/365 coverage in support of our domestic and international businesses
  • Run major incidents
  • Work with the product, infrastructure, and engineering teams daily
Qualifications for site reliability engineering
  • Strong troubleshooting experience and skillset to resolve incidents across multiple domains
  • Demonstrated ability of establishing and maintaining metrics based process improvement
  • Interest or experience in cloud technologies (AWS, Docker, Kubernetes)
  • Practical expertise in managing and leading application reliability practices for consumer facing web and mobile experiences
  • Ability to work across teams to continuously analyze system performance in production, troubleshoot consumer reported issues, and proactively identify areas in need of optimization
  • Previous experience with developing and driving real time monitoring solutions that provide visibility into site health and key performance indicators
5

Site Reliability Engineering Job Description

Job Description Example
Our growing company is searching for experienced candidates for the position of site reliability engineering. If you are looking for an exciting place to work, please take a look at the list of qualifications below.
Responsibilities for site reliability engineering
  • Lead and support highly experienced PaaS/SaaS product deployment and maintenance team
  • Cloud, IT service and support vendor management experiences
  • Maintain SLA s of Data Enabled Business’ cloud service and application offerings
  • Designing and developing tools and processes to maintain large applications and services at scale
  • Helping our engineers and data scientists build software that scales in terms of performance and stability
  • Ruthlessly identifying and removing system bottlenecks before they ever impact performance
  • Working side by side with on-call engineers to handle emergencies and then running postmortems to ensure they don’t happen in the future
  • Establishing best practices inside the organization, proving that they work and then bringing them to other DEB teams and JCI
  • Where you can provide the most value
  • Promptly responds to incoming communications (telephone calls, emails, instant messaging, ) and directs reports and information requests appropriately
Qualifications for site reliability engineering
  • 3-7 years’ technical experience working with consumer facing (e-commerce) software applications
  • Experience with service discovery tools such as
  • Can read and write in programming languages
  • It requires a strong desire to dig deep into a wide range of technologies, and a relentless drive to make the customer experience better through investments in automation and infrastructure improvements
  • Proven experience working with infrastructure components (operating systems, compute, storage, networking, distributed systems, big data, cloud, containers, ) and solid understanding of the rest and how they impact services
  • Experience applying SREs skills to both drive quality improvements of already deployed services and, more importantly, cross training colleagues to build site and service quality into new development and integration efforts

Related Job Descriptions

Resume Builder

Create a Resume in Minutes with Professional Resume Templates