Cloud Computing Site Reliability Engineer
8 - 10 years experience • Life Sciences & Research
The Data Science and Innovation Group in the Strategic Energy Analysis Center is in need of a talented DevOps engineer with systems administration skills, specifically expertise in cloud configuration management and site reliability. This candidate must have interest in making a difference in the world by helping to build, maintain, monitor, automate and enhance cloud hosted systems that support a variety of web applications and data-sharing platforms related to sustainable energy technologies and markets.
This position will predominantly involve systems architecture, server provisioning, and cloud systems administration. This candidate will work directly with project leaders to help design, implement and support cloud hosting solutions for both external and internal applications. Some systems architecture work may also be required utilizing the internal NREL network.
The ideal candidate must be able to work across organizations and have strong communications skills in order to work with project clients as well as internal staff in the Office of the Chief Information Officer (OCIO). Being able to think creatively and strategically is a plus and being comfortable writing appropriate systems documentation is also required.
We have a team of web developers and administrators who are creating leading-edge web applications and data-driven web platforms to advance sustainable energy research and deployment. Working with them, you will:
- Be a key systems administrator, engineer, and architect for NREL’s cloud platform.
- Provide DevOps support to, and work closely with, project owners and developers to design server solutions that meet evolving needs and monitor/enhance performance.
- Configure and maintain our production, staging, testing, and continuous integration environments.
- Partner with our database administrators to ensure data systems are robust.
- Automate provisioning using a variety of techniques including Docker Containers, Chef, and other means. Maintain an advanced automated configuration system for deployment, monitoring, testing, load balancing, backup/recovery, etc.
- Manage hundreds of Linux servers using Amazon Web Services (AWS).
- Measure, analyze and monitor the infrastructure for availability, throughput, and performance.
- Help development teams right-size their platforms, ensuring that systems are sized correctly in the beginning, and enabled to scale appropriately over time based on performance and usage needs.
- Design and implement processes to maintain overall system security and reliability.
- Represent needs of project owners with OCIO and cybersecurity professionals, and work with software engineers to address escalated issues.
- Manage user accounts, authorizations and roles.
-Be responsible for site uptime, which may include working outside of regular 8:00 am -5:00 pm hours.
- Write appropriate documentation.
The ideal candidate works to keep current in position by taking the various learning avenues: attends relevant classes/webinars, works with a mentor or seeks advice from subject matter experts, reads/reviews professional publications, joins relevant industry organizations, attends conferences, etc.
Required Education, Experience, and Skills
Relevant PhD and 4 or more years of experience or equivalent relevant education/experience.
Or, relevant Master's Degree and 7 or more years of experience or equivalent relevant education/experience.
Or, relevant Bachelor's Degree and 9 or more years of experience or equivalent relevant education/experience.
Applies extensive engineering technical expertise, and has full knowledge of other related disciplines. Considered a technical resource.
Demonstrates leadership in several areas of team, task or project lead responsibilities. Demonstrated experience in management of projects. Excellent writing, interpersonal and communication skills.