R/O Resource Solutions  Providing the Right Talent for Success

Job Opportunities

 
Job Title Site Reliability Engineer - SRT
Location Arlington, VA
Job Type Full-Time Regular
Job Description

Position Title: Site Reliability Engineer

Hiring Manager: 

Conversion Salary: Open ended, however please keep in mind that salary is based on experience.

Location: Washington, D.C.

Desired Start: ASAP

Looking for direct hire ideally.

 




Company is building teams of Site Reliability Engineers (SRE’s) to support internal and external engineering and operations of a large scale and world-wide Enterprise IT environment that covers application hosting and support, enterprise services, and infrastructure services. 

The role of SRE is a highly technical role, and requires systematic understanding of all components of a modern web application stack, including front-end, networking, and systems level knowledge.   Ideally, you are energized by learning current cloud technologies and are eager to jump into mapping out proposals on a white board as well as jumping into day-to-day monitoring and technical work.   We aren’t risk averse and have healthy dialogue where we pick each other’s brains and challenge each other.  At the end of the day, we take care to learn from our mistakes and feel confident that we understand needs and address them in a sensible, holistic fashion.  Together, we’ll drive towards the most efficient, modern and smartly built systems that maximizes automation and the power of new technologies.

The ideal candidates for this team will possess a system engineering background with a strong Linux skill-set. Additionally, candidates will have experience with one or more of the following:

Puppet infrastructure and module writing

  • Software development, preferred in python or ruby
  • Public Key Infrastructure (PKI)
  • VMWare virtualization and automation
  • System provisioning and lifecycle management; experience with Red Hat Satellite
  • Networking
  • Container technologies such as Docker, and PaaS products such as OpenShift
  • Monitoring with experience in white box and black box monitoring

Responsibilities

  • * Work with internal teams and clients to ensure systems are effectively integrated, configured, managed and supported in pre-production and production 
  • * Implement system and application monitoring for custom requirements and application up time with the intent of maximizing platform reliability
  • Troubleshoot and analyze system issues, delving into hardware, networks, application, and storage/DB layers as needed
  • Participate in lifecycle management lifecycle management of the Linux OS platform and applications including Puppet, Red Hat Satellite, Red Hat Cloudforms, and OpenShift as well as future applications. Install and support in-house, open-source, and 3rd party applications throughout the technology stack
  • Treat configuration as code - manage, design, deploy, and test system operations
  • Continuously identify and develop automation tools to eliminate manual tasks to reduce errors
  • Deploy software in a repeatable and documented way; capture and maintain documentation of specifications, process, systems, and procedures
  • Practice continuous improvement by Identifying and removing single points of failure or unnecessary redundancies
  • Install and configure system services, with a focus on automation and repeatability
  • Proactive, professional and collaborative client communication.
  • Provide peer support to other SREs and the client

Qualifications

  • BS in Computer Science or related experience
  • 5 or more years’ experience working in Linux environments
  • Experience with large distributed environments
  • 3 or more years supporting production web sites or online applications
  • Excellent troubleshooting skills with the ability to dive deep into all aspects of the stack to identify and fix problems