Senior Reliability Engineer Job Description
Senior Reliability Engineer Duties & Responsibilities
To write an effective senior reliability engineer job description, begin by listing detailed duties, responsibilities and expectations. We have included senior reliability engineer job description templates that you can modify and use.
Sample responsibilities for this position include:
Senior Reliability Engineer Qualifications
Qualifications for a job description may include education, certification, and experience.
Licensing or Certifications for Senior Reliability Engineer
List any licenses or certifications required by the position: AWS, CCNP, CCNA, ITIL, CQE, ASQ, TCP/IP, PCI, CRE, DFSS
Education for Senior Reliability Engineer
Typically a job would require a certain level of education.
Employers hiring for the senior reliability engineer job most commonly would prefer for their future employee to have a relevant degree such as Bachelor's and Master's Degree in Engineering, Computer Science, Science, Technical, Electrical Engineering, Physics, Mechanical Engineering, Mathematics, Manufacturing, Education
Skills for Senior Reliability Engineer
Desired skills for senior reliability engineer include:
Desired experience for senior reliability engineer includes:
Senior Reliability Engineer Examples
Senior Reliability Engineer Job Description
- Monitors and analyses reliability performance of the product life cycle
- Perform code reviews, evaluate implementations, and provide feedback about potential tool improvements
- Partake in an on-call rotation alongside the engineers
- Fine tuning of web servers, use of caching layers
- Apply predictive Maintenance (PdM) methods such as oil analysis, thermography, vibration analysis
- Promote DevOps culture across the organization
- Be the one to define new monitoring strategy
- Develop and enhance monitoring automation
- Support the Maintenance Manager in leadershipand management of the Mechanical Technicians to encourage and control an effective and disciplined workforce through energetic and driven leadership
- Take ownership of the mechanical integrity standards
- Solid networking experience - TCP/IP, administration of networking hardware (Cisco, Foundry, ), load balancing - Considered a PLUS
- Familiarity with open source and 3rd party Monitoring Systems (Nagios, kafka, SMARTS)
- Preferable to possess hands-on experience in reliability experiments/ modeling, reliability program development and implementation and reliability failure analysis
- Minimum 4 years experience in production service troubleshooting that spans applications, systems and network
- Experience building systems on cloud technology (AWS, GCE, Rackspace, Openstack)
- Experience with queuing/data-pipelining solutions (Kafka, Storm, Flink, Spark, Amazon Kinesis, etc)
Senior Reliability Engineer Job Description
- Develop engineering solutions to repetitive failures and other problems that adversely affect plant operations
- Performing an independent audit of the manufacturing line and product post-SOP
- Liaising with Territory Teams and Current Support Teams to monitor customer claims and arrange engineering sample collections of products/assemblies of interest
- Liaising with Current Support Teams on design mitigations for market and production issues
- Manage deployment of different features
- Work with Dev teams to create more robust solutions
- Closely work with Dev teams on code improvement and stability
- Work with QA teams for reproduction and setup of complex production issues
- Provide patches as and when needed for issues found in production
- Automation and development of operations tools, application dashboards
- Configuration management experience with configuration management tools such as Puppet, Chef, Ansible
- Operations representative for capital projects involving new rotating machinery and non-static equipment
- Submit project charter and budget requests for equipment upgrades to existing machinery
- Work with Mechanical Technical Advisor to review and update existing job plans, and maintenance procedures
- Monitor overall PM program and critical spare parts information, propose changes to PM intervals and parts stocking levels
- Provide feedback to Facilities Engineering regarding engineering design standards for rotating machinery, to drive improvement with equipment designs
Senior Reliability Engineer Job Description
- Design, implement, deploy and maintain site reliability process and systems
- Provide service outage escalation response and guidance alongside software engineers
- Review, correlate and assess impact of monitoring metrics in relation to current system behavior
- Research new tools and technologies, and ways to more elegantly/ efficiently solve problems
- Create and maintain documentation on installations, tools and procedures
- Conduct root cause analysis of production issues including troubleshooting and debugging through very complex backend pipelines
- Mentor and guide SREs and Systems Administrators on effective methods to deliver enterprise-class services
- Collaborate deeply with a cross functional team of Software Architects and Engineers, Quality Engineers, Engineering management and other SREs
- Support capital projects by participating in setting equipment "fitness for use" criteria and specifications to optimize equipment/system reliability and life cycle cost
- Work closely with the maintenance and operations management team in a partnership to achieve reliability excellence
- Assist in the training / coaching of field personnel in start-up/operations, troubleshooting, repairs and preventative maintenance for rotating machinery
- Support site Condition Based Monitoring team to develop, monitor, and communicate mean-time-between-failures data for all critical rotating equipment within the BU
- Masters or Doctoral degree in Electrical, Mechanical or Materials Engineering or a related field
- 4 years experience with UNIX and TCP/IP network fundamentals
- 1 year experience developing software implementing API functionality using REST, Thrift, JSON or similar
- 2+ years of experience using configuration management software like Chef, Puppet, or Ansible
Senior Reliability Engineer Job Description
- Be a key member of the SRE team that owns all of the client-facing APIs and recognition infrastructure
- Participate in the development lifecycle at a very deep level, from the early stages of feature design all the way to seeing it released into production for our users to enjoy
- Write and review code
- Help us continue to define and implement good Site Reliability Engineering practices within our SRE teams the broader Engineering organisation
- Identify and implement new technologies that can help us do our jobs better or identify new and better uses for existing technologies that we already use
- Be perceived as a thought-leader for SRE both internally and externally
- Review equipment strategy and assess risk vs
- Maintain the electrical power distribution system
- Contributing to the team responsible for delivering package/module functional SRAM and logic stress qualification (HTOL, ELFR, SER, Environmental stresses) on advanced technology nodes
- Planning and execution to meet functional/quality targets to project qualification schedules
- 2+ years working with basic large-scale internet service architectures (such as load balancing, LAMP, CDN’s), even if you haven’t worked on one
- 2+ years experience handling configuration and maintenance of common applications such as Apache, memcached, MySQL/MariaDB, Couchbase, and RabbitMQ
- Experience with Logstash/Graphite/InfluxDB/Grafana/Cabot and other diagnostic and alerting tools
- Bachelor’s Degree in Chemical, Mechanical, Biosystems Engineering or other engineering discipline that has a strong foundation in thermodynamics, fluid mechanics, mass/material balance and process design OR minimum of 10 years’ experience in production, maintenance, or project management in lieu of a degree
- Bachelor’s degree in Electrical Engineering, Physics or related fields or equivalent experience
- Experience in performing trend analysis, mathematical modeling, and results validation
Senior Reliability Engineer Job Description
- Implement changes in collaboration with asset owner and maintenance personnel to optimize PMs
- Apply appropriate reliability tools to drive improvements, optimizing costs, and improve Asset Effectiveness (i.e., Bad Actor Program, FMEA, RBI, RBM, RCM, Reliability Roadmap)
- Oversee the Bad Actor Program along with the Operational Excellence Engineer and create action plans for improving reliability
- Lead Asset Effectiveness improvements through the identification of lost availability and poor reliability
- Oversee site reliability craft resources and provide them with technical direction
- Evaluate and execute improvements to plant physical assets to minimize total cost of ownership considering asset replacement, business priorities, and capital expenditures
- Work with Reliability Experts from other sites and within the industry to continue development of a multi-year reliability improvement plan
- Maintain all reliability maintenance records and files for the analysis and reporting of all reliability maintenance-related business matters
- Support of all EH&S policies and procedures
- Hands-on application management and support for both on-prem and cloud production environments, including full-stack diagnosis, fault resolution and root cause analysis
- Background with statistical or reliability software package
- Experience working independently and providing mentoring and guidance for junior analysts
- BS in an Engineering and Science Field and 4 years’ experience of research, development and engineering
- At a minimum, a BA/BS degree (BS Engineering preferred) or equivalent in a Life Science, Engineering, or Physical Science is required
- A Certified Quality Engineer /Certified Quality Auditor Certification (ASQ or equivalent) are preferred
- A strong background in Process Excellence, Six Sigma, Lean methodologies or operational excellence preferred