Senior Site Reliability Engineer 9/1/2016
Palo Alto, CA
JOB DESCRIPTIONAPPLY Post Date:
SalesforceIQ Senior Site Reliability Engineer
SalesforceIQ is built to power the world's customer relationships with products they love . Our relationship intelligence platform reimagines how sales organizations can automate their businesses by capturing and processing the millions of digital signals sent every day. Constructing delightful solutions from a sea of data and complexity is what we do. Our team moves with the speed, independence, and culture of a startup, but with the full support of a rapidly growing Fortune 500 business. We believe strongly in empowering engineers to create elegant and beautiful solutions for our customers: we push code daily, prize unique ideas, and take the time to enjoy the moments along the way.
Senior Site Reliability Engineers at SalesforceIQ are hybrid software/systems engineers who ensure that SalesforceIQ's services run smoothly and have the capacity for future growth. You will be responsible for managing our production services and will be working very closely with developers and other Ops teams to ensure reliability, scalability and performance of our cloud infrastructure.
* Develop and deliver configuration and deployment automation required for improving the functionality, availability, and manageability of our microservices using Python or Ruby and configuration automation tools such as Puppet, Chef, or Ansible.
* Build infrastructure and application monitoring by gathering application and system metrics and implement tools for recoveries.
* Troubleshoot availability/performance problems and build software-based solutions to prevent recurrences.
* Define and evangelize cloud-related optimizations and best practices to improve reliability and performance.
* Perform code reviews, evaluate implementations, and provide feedback about potential tool improvements.
* Partake in an on-call rotation alongside the engineers.
* BS in Computer Science (or equivalent experience) or equivalent practical experience.
* Minimum 4 years experience in production service troubleshooting that spans applications, systems and network.
* Experience building systems on cloud technology (AWS, GCE, Rackspace, Openstack).
* Experience with queuing/data-pipelining solutions (Kafka, Storm, Flink, Spark, Amazon Kinesis, etc).
* Configuration management experience with one or more configuration management tools such as Puppet, Chef, Ansible.
* Experience with container technologies and orchestration layers (Docker, Vagrant, Mesos, Marathon, etc).
* Demonstrated coding skills, preferably in Python, Ruby, Java.
* Demonstrable knowledge of UDP, TCP/IP, HTTP, distributed systems.
* Solid understanding of application design, including the operational trade-offs of various designs.
* Experience working in Unix/Linux operating systems and shell scripting.
* Excellent analytical skills, coupled with a strong sense of ownership, urgency, and drive.
* Ability to work independently and collaboratively with multiple partners.
* Comfortable with Agile methodologies and working within small teams.
Salesforce.com is an Equal Employment Opportunity and Affirmative Action Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status. Headhunters and recruitment agencies may not submit resumes/CVs through this Web site or directly to managers. Salesforce.com does not accept unsolicited headhunter and agency resumes. Salesforce.com will not pay fees to any third-party agency or company that does not have a signed agreement with Salesforce.com.