|Date Posted||September 11, 2019|
Kforce has a client that is seeking a SRE in Pleasanton, CA.
- You will need to employ deep troubleshooting and scripting skills to improve the availability, performance, and security of Ellie Mae Services to ensure services are designed with 24/7 availability and operational readiness and rigor.
- Utilize your Coding and Automation of Applications on Cloud Platform to implement automated tests, automated deployments, and operational tools.
- You will collaborate with Product and Support teams to plan and deploy product releases.
- You will work with Cloud Platform and Operations leaders to develop narratives, backlog grooming, epic planning and overall sprint planning processes
- Partner with Engineering leadership to build shared services that meet the requirements and need of the platform and application teams
- Set Strategic and Operational goals and work with the team to deliver on goals
- Implementation of proactive monitoring, alerting, trend analysis and self-healing systems
- Participate in on-call rotations, driving restoration and repair of service-impacting issues
- Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems
- Contribute to product development / engineering as needed to ensure Quality of Service of Highly Available services
- Take a command and control role as Incident Manager during critical incidents focusing on minimizing MTTR & MTTD