|Date Posted||June 7, 2019|
Job Title: Cloud Site Reliability Engineer
Location: Nashville, TN
Duration: 1 years + possible extension
Your role will require you to:
- ensure production systems run reliably at all times, that availability, performance and business process SLAs are met or exceeded
- manage Cloud services that span storage, security, networking, and compute cloud capabilities
- spend 50% of time on operational activities and 50 % on improvements that deliver engineering solutions that improve instrumentation, ease of deployment, service orchestration and other aspects of production support – reduce the burden of manual work involved as systems and user volumes scale.
- take responsibility for all aspects of application production support, deployment and monitoring and develop tools to support these activities
- support mission critical applications and associated platforms, ensuring the highest levels of availability, security, performance and stability are maintained at all times
- design and build tools and solutions with a strong bias towards automating as many aspects of support as possible to reduce or eliminate trivial support activities
- ensure newly deployed systems / services can be integrated into the existing monitoring and management tools so that the performance of the service and deviations are easily anticipated and instrumented
Cloud technologies bring a myriad of technical agility which *** are actively embracing. As such, you'll be working as a Cloud Site Reliability Engineer in the newly formed Cloud Operations Support team in one of our global h***. In partnership with your global colleagues, you'll provide follow-the-sun Cloud reliability support to enhance our customer's experiences. Microsoft is the primary technology used within this role.
Your experience and skills
- A strong expertise in Microsoft Azure. Your addition to our team is expected to "Raise the Bar " of our team's Azure capabilities.
- significant development and operations / engineering experience with the ability to apply that knowledge in order to solve complex problems
- subject matter expertise of Azure Resource Manager, Monitor, Alerts, Security Centre, DevOps, Azure Policy, RBAC and application Source code such as Java/C++/C#
- a blend of skills, including sysadmin, security, automation and the ability to code with a strong knowledge of OS and Application Source Code, Expert Level ARM Templating , Container Fabrics, Networking , Alerting and Monitoring
- a complex understanding of each service across the full IT lifecycle and be ready to take requests for infrastructure services, applications and environments
- experience designing solutions (Monitoring / process orchestration / capacity management / deployment) that can scale and potentially be leveraged by other parts of the organisation
- hands on experience working in both Agile and DevOps development methodologies
- Expect to demonstrate these capabilities through our selection process which will include technical tests, peer interviews and client interviews.
- confident in interacting with developers and deep diving into both Application and Infrastructure code
- willing to challenge the status quo and introduce new ideas that will remove or reduce manual effort in relation to operating large production systems at scale
- resolute, pragmatic, articulate and determined
- able to work to tight deadlines in high pressure environments
- a skilled communicator, able to explain complex technical issues and resolutions