Site Reliability Engineering Job Description

4.7

188 votes for Site Reliability Engineering

Site reliability engineering provides operational and engineering support for Oracle RAC, Oracle ASM, Oracle ADG, Data Encryption (Oracle TDE), Oracle GoldenGate, Oracle GoldenGate with Bigdata, Heterogeneous Replication, Partitioning, Oracle VPD, MSSQL Always-On, MySQL Replication, and Redis clustering.

Site Reliability Engineering Duties & Responsibilities

To write an effective site reliability engineering job description, begin by listing detailed duties, responsibilities and expectations. We have included site reliability engineering job description templates that you can modify and use.

Sample responsibilities for this position include:

Define and verify standards for configuration, monitoring, reliability and performance

Participate in projects and influence project directions in compliance with PDIT technology standards

Create and adjust PEO standards related to various automation and orchestration technologies

Lead and grow the team of top-talent senior SREs responsible for front-door incident management

Continually drive down time-to-detect and time-to-resolve through improved outlier detection and real time root cause analysis

Drive operational best practice adoption across critical services, continually looking to lower operational barriers to achieving improved reliability

Drive organizational awareness of the importance of availability, applying a creative lens to the many facets of availability

Participates in the management of full life cycle product development to include analysis and planning related to product development, launch and deployment

Document and detail areas of improvement to bolster architecture, design, technical requirements and service specifications

Demonstrates technical leadership and mentoring on the application of new technologies and systems management methodologies

Site Reliability Engineering Qualifications

Qualifications for a job description may include education, certification, and experience.

Licensing or Certifications for Site Reliability Engineering

List any licenses or certifications required by the position: ITIL, AWS, DNS, CCNA, RHCE, SSL, HTTP, TCP, TLS, SQL

Education for Site Reliability Engineering

Typically a job would require a certain level of education.

Employers hiring for the site reliability engineering job most commonly would prefer for their future employee to have a relevant degree such as Bachelor's and Master's Degree in Computer Science, Technical, Engineering, Mathematics, Software Engineering, Science, Systems Engineering, Management, Information Systems, Physics

Skills for Site Reliability Engineering

Desired skills for site reliability engineering include:

Architectures

Firewalls and TCP/IP networks

Networking systems and protocols

Relational database systems – MySQL

SQL Server 2005/2008/2012

Standards

Web technologies

Oracle 11 and Oracle RAC

Storage Management Solutions

Implementation and networking

Desired experience for site reliability engineering includes:

Supervise a team of SREs, ensuring that production applications your team supports are stable, reliable, and well-documented

Work closely with engineering managers, technical leads, architects and development teams to ensure that platforms are designed with scale and operability in mind

Troubleshoot and debug complex issues in production applications

Communicate effectively with people at all levels of the organization

Manage team of 8-10 high performing OpsAutomation/devOps/SRE individuals

Previous experience with ESX/vSphere, NSX, vSAN is highly desirable

Site Reliability Engineering Examples

Site Reliability Engineering Job Description

Job Description Example

Download

Our innovative and growing company is searching for experienced candidates for the position of site reliability engineering. Thank you in advance for taking a look at the list of responsibilities and qualifications. We look forward to reviewing your resume.

Responsibilities for site reliability engineering

Ensure adherence to SLAs and quality standards
Help design and improve data pipelines with the goal of making them easily monitored and cost effective
Commit code for instrumenting new and existing data pipelines with stats and monitoring hooks
Install, upgrade, and maintain our production Splunk infrastructure
Own, scale, and improve our in-house stats infrastructure, supporting over 600,000 individual metrics per minute
Evaluate next-generation monitoring and metrics collection tools and utilities
Help guide our Data Engineering team towards SRE best practices
Own and operate the architecture and systems that collect data in real-time from over 120 million unique users per month
Select and develop automation tools and scripts to improve the availability, manageability, scalability and operability of services
Solve performance and stability issues and prevent their recurrence

Qualifications for site reliability engineering

Objectionable
A passion towards automating things
An understanding of the 12 Factor App
A high degree of interest in Linux containers and smart clustering solutions like Kubernetes/Mesos/fleet
Strong experience in at least one infrastructure component (operating systems, compute, storage, networking, distributed systems, big data, cloud, containers, ) and solid understanding of the rest and how they impact services
Bachelor's degree in Computer Science or equivalent qualification/experience

Site Reliability Engineering Job Description

Job Description Example

Download

Our growing company is hiring for a site reliability engineering. Please review the list of responsibilities and qualifications. While this is our ideal list, we will consider candidates that do not necessarily have all of the qualifications, but have sufficient experience and talent.

Responsibilities for site reliability engineering

Implement comprehensive service monitoring to ensure uptime and performance, including synthetic, real user, system, application performance, dashboards
Define, measure, and meet key Service Level Objectives including availability, performance, incidents and chronic problems
Partner with application and business stakeholders to ensure high quality product is developed and released into production
Partner with application owners to ensure adequate performance, scalability of reliability of underlying infrastructure
Establish the annual release calendar in partnership with application owners and monitor adherence to the Release Management processes, policies and procedures
Roll up your sleeves and debug/tune/code/fix alongside your team
Coach and mentor junior and new college graduates
Evaluate, innovate, develop, and support any variety of internal PE&O automation systems geared to produce efficiency at scale
Able to differentiate and articulate the difference between good and bad design at numerous levels
Provide internal production system support

Qualifications for site reliability engineering

Significant experience in designing, delivering and managing data infrastructure at scale
A deep technical understanding of modern batch and real-time data technologies
A proven track record of managing large volumes of data in cloud services while controlling costs
Advanced knowledge of Unix/Linux systems
Ability to write code
Ability to learn rapidly and communicate value of new technologies to technical and non-technical audiences

Site Reliability Engineering Job Description

Job Description Example

Download

Our company is growing rapidly and is hiring for a site reliability engineering. Thank you in advance for taking a look at the list of responsibilities and qualifications. We look forward to reviewing your resume.

Responsibilities for site reliability engineering

Providing standardized offerings to facilitate the successful secure access to stacks and the cloud environment overall
Manage engineers working with the engineering teams on our back-end services like our Hadoop, HDFS, Memcached, Reddis, Kubernetes, AWS, Java, Golang, Linux
Directly leading and training a team of Site Reliability Engineers focused on high availability
Coach and train engineers on actively diagnosing real-time production environment by analyzing code, log files, network traces and request/response pairs
Ensure team is working efficiently and effectively to identify root cause of failures, determine quickest path to resolution, and take actions to prevent similar issues from occurring in the future
Build and maintain relationships with product managers, support teams and leadership
Interface with front-end and back-end developers providing performance data and guidance on areas for improvement
Work with vendor contacts to manage business relationship and support needs
Participate in shared on-call support phone rotation and handle escalations
Define and execute on a roadmap evolving our monitoring and reliability capabilities

Qualifications for site reliability engineering

Meticulous and careful
Experience with web-based tool development (Python/Django, Java, Ruby/RoR), and building infrastructure tooling and reporting
Automation mindset - if you can automate it, do it
Have expert level skills in Linux/Windows system and network administration and agile implementation of production systems
10+ years of hands-on technical experience combined with strong management and communication skills
Solid understanding of Windows, Linux, Networking, TCP-IP, Routing, Switching, Firewalls, Load balancers and other infrastructure components

Site Reliability Engineering Job Description

Job Description Example

Download

Our company is looking for a site reliability engineering. Thank you in advance for taking a look at the list of responsibilities and qualifications. We look forward to reviewing your resume.

Responsibilities for site reliability engineering

Lead lifecycle management process to ensure clearly defined roadmaps for new technology solutions ensuring seamless transformation and adoption by the business
Define and document technical requirements for new capabilities, working with key suppliers to solution, build and lab certify ensuring compliance with all functional, operational and business objectives
Highest escalation point for critical and/or chronic incidents, provide subject matter leadership to Operations, helping to restore service
Lead/Contributor on key network projects, representing the organization and working closely with Project Managers, Operations, Business Units, Suppliers, Peer Organizations and IT stakeholders
Development experience with automation functions to setup, configure, and upgrade various network technologies, improving quality and reducing manual efforts
Lead, coach and develop engineers across 3 shifts, including remote employees
Manage shift leads who each have direct reports
Oversee 24/7/365 coverage in support of our domestic and international businesses
Run major incidents
Work with the product, infrastructure, and engineering teams daily

Qualifications for site reliability engineering

Strong troubleshooting experience and skillset to resolve incidents across multiple domains
Demonstrated ability of establishing and maintaining metrics based process improvement
Interest or experience in cloud technologies (AWS, Docker, Kubernetes)
Practical expertise in managing and leading application reliability practices for consumer facing web and mobile experiences
Ability to work across teams to continuously analyze system performance in production, troubleshoot consumer reported issues, and proactively identify areas in need of optimization
Previous experience with developing and driving real time monitoring solutions that provide visibility into site health and key performance indicators

Site Reliability Engineering Job Description

Job Description Example

Download

Our growing company is searching for experienced candidates for the position of site reliability engineering. If you are looking for an exciting place to work, please take a look at the list of qualifications below.

Responsibilities for site reliability engineering

Lead and support highly experienced PaaS/SaaS product deployment and maintenance team
Cloud, IT service and support vendor management experiences
Maintain SLA s of Data Enabled Business’ cloud service and application offerings
Designing and developing tools and processes to maintain large applications and services at scale
Helping our engineers and data scientists build software that scales in terms of performance and stability
Ruthlessly identifying and removing system bottlenecks before they ever impact performance
Working side by side with on-call engineers to handle emergencies and then running postmortems to ensure they don’t happen in the future
Establishing best practices inside the organization, proving that they work and then bringing them to other DEB teams and JCI
Where you can provide the most value
Promptly responds to incoming communications (telephone calls, emails, instant messaging, ) and directs reports and information requests appropriately

Qualifications for site reliability engineering

3-7 years’ technical experience working with consumer facing (e-commerce) software applications
Experience with service discovery tools such as
Can read and write in programming languages
It requires a strong desire to dig deep into a wide range of technologies, and a relentless drive to make the customer experience better through investments in automation and infrastructure improvements
Proven experience working with infrastructure components (operating systems, compute, storage, networking, distributed systems, big data, cloud, containers, ) and solid understanding of the rest and how they impact services
Experience applying SREs skills to both drive quality improvements of already deployed services and, more importantly, cross training colleagues to build site and service quality into new development and integration efforts

Related Job Descriptions

Browse More

Site Reliability Engineering Job Description

Site Reliability Engineering Duties & Responsibilities

Site Reliability Engineering Qualifications

Licensing or Certifications for Site Reliability Engineering

Education for Site Reliability Engineering

Skills for Site Reliability Engineering

Site Reliability Engineering Examples

Site Reliability Engineering Job Description

Site Reliability Engineering Job Description

Site Reliability Engineering Job Description

Site Reliability Engineering Job Description

Site Reliability Engineering Job Description

Related Job Descriptions

Resume Builder

I am an Employer

I am a Candidate