Site Reliability Engineer

This job is no longer active. View similar jobs.

POST DATE 8/16/2016
END DATE 12/19/2016

Confidential Company Sterling, VA

Sterling, VA
AJE Ref #
Job Classification
Full Time
Job Type
Company Ref #


s part of a small but highly capable team you will be tasked with tackling all aspects of Application Uptime, you'll be building and maintaining systems in Scala plus are expected to use Ruby or whatever other language you prefer to build and iterate on new tools as needed. Your responsibilities include:

* Leadership role in all Site Uptime projects from problem recognition to prioritization of work, design and implementation of solutions.
* Focus specifically on all externally visible issues.
* Focus on our Outage Response and Recovery processes and tooling
* Focus on Application Error Monitoring and Reporting
* Focus on Deployment and Rollback success and speed
* Work in and contribute to a shared codebase
* Rapidly debug, fix and solve problems
* Integrate with existing systems and tools and rip and replace where needed.
* Identify and address application and system performance bottlenecks in a high throughput production environment


* A solid understanding of all server infrastructure technologies with production operations experience
* Experience with scale issues and large infrastructures.
* 6+ years of industry experience