This company is committed to hiring Veterans

Infra Reliability Engineer / Service Availability Manager

This job is no longer active. View similar jobs.

POST DATE 9/14/2016
END DATE 2/25/2017 San Francisco, CA

San Francisco, CA
AJE Ref #
Job Classification
Full Time
Job Type
Company Ref #
Mid-Career (2 - 15 years)
Bachelors Degree


Post Date:

Salesforce will consider for employment qualified applicants with criminal histories in a manner consistent with the requirements of the San Francisco Fair Chance Ordinance.

JOB TITLE: Infra Reliability Engineer/Service Availability Manager

LOCATION: San Francisco

The Corporate IT Infrastructure Department at Salesforce is responsible for support of our most critical Corp IT services and the reliability of these services. We are responsible for the Corporate IT Disaster Recovery program for all internal IT infrastructure.

We are looking for an experienced self-starter to lead our Service Availability and Reliability function.
In this role, you will lead efforts to analyze evaluate the technologies that support our most critical Corp IT services and in this process identify opportunities to improve the reliability of these services. Then, you will collaborate with key technical stakeholders from across domains (Systems, Storage and Network) to develop and implement plans to improve service availability and reliability. In addition, a key responsibility of this role will be to provide monthly, executive-level reporting on the availability of our services. Finally, the successful candidate will also lead and coordinate the Corporate IT Disaster Recovery program.

The ideal candidate will have experience working in infrastructure, good understanding of service availability concepts and ITIL and excellent analytical and presentation skills. The Service Reliability Engineer will be comfortable facilitating and working with teams from all disciplines in a fast paced, solutions oriented environment.
* Lead reliability and maintainability reviews of various Infra technologies and services. Identify areas that are most likely to cause service outages
* Recommend and develop cost estimates for solutions that will improve our infrastructure reliability
* Drive our Disaster Recovery Program, including working with our BCP team to identify and maintain the Tier categorization of our applications and technologies.
* Support the coordination of Disaster Recovery exercises
* Develop and maintain a service availability reports that are easy to understand. Present clearly and concisely to executives.
* Accomplish complex tasks through influence
* Identify and remove impediments that block the team's ability to achieve their goals
* Update our Business Continuity Plan quarterly
* Orchestrate the efforts of cross functional teams to execute on strategic availability and reliability efforts
* BS in Computer Science
* Minimum of 5 years hands on experience in supporting or troubleshooting one of the following: Network Administration, Systems Administration or Infrastructure Applications, with working knowledge of the other two areas.
* Demonstrable knowledge of backup, replication technologies used in Disaster Recovery and Business Continuity
* 2+ years of configuration management experience with one or more configuration management tools such as Puppet, Chef, Ansible.
* 2+ years of monitoring experience with such tools as Zabbix, Nagios, Openview, Splunk
* Solid and demonstrable understanding of application design, including the operational trade-offs of various designs.
* Ideally 2+ years of program management experience, especially in the DR /BCP space; understanding of how to breakdown work, coordinate across multiple teams and move projects forward despite not having direct reports.
* You have excellent analytical and critical thinking skills.
* You have superb communication skills and are able to take complex technical problems and explain to non technical people.
* You have the ability to influence and collaborate.
* Experience with container technologies and orchestration layers (Docker, Vagrant, Mesos, Marathon, etc).
* You have good working knowledge of ITIL and Service Availability concepts
* Good organizational, communication, and interpersonal skills (verbal/presentation/written).
* You have proven experience working with teams across multiple departments (cross-functional groups a plus).
* You have the ability to use effectively, common office productivity tools (Google Sheets, Google Slides, Google Docs

Salesforce, the Customer Success Platform and world's #1 CRM, empowers companies to connect with their customers in a whole new way. The company was founded on three disruptive ideas: a new technology model in cloud computing, a pay-as-you-go business model, and a new integrated corporate philanthropy model. These founding principles have taken our company to great heights, including being named one of Forbes's "World's Most Innovative Company" five years in a row and one of Fortune's "100 Best Companies to Work For" eight years in a row. We are the fastest growing of the top 10 enterprise software companies, and this level of growth equals incredible opportunities to grow a career at Salesforce. Together, with our whole Ohana (Hawaiian for "family") made up of our employees, customers, partners and communities, we are working to improve the state of the world. is an Equal Employment Opportunity and Affirmative Action Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status. Headhunters and recruitment agencies may not submit resumes/CVs through this Web site or directly to managers. does not accept unsolicited headhunter and agency resumes. will not pay fees to any third-party agency or company that does not have a signed agreement with