Site Reliability Engineer Jobs - Site Reliability Engineer, 15686

at Apex Systems
Location Rosemont, IL
Date Posted July 4, 2019
Category Default
Job Type Contractor

Description

SOFTWARE RELIABILITY ENGINEER (SRE)
Join our growing Software Engineering department as a Software Reliability Engineer (SRE) to help design,develop, and deploy high quality solutions that support 80% of US clinicians.We are looking for an experienced candidate with proven problem-solving and troubleshooting skills who can influence and lead teams to successfully execute DevOps tenants that the SRE team strives to implement. Our SRE team takes the lead on operational availability of our internally developed cloud-hosted applications throughout the Software Engineering organization. SRE team members are partnered with our application development teams, working together to build, deploy, and support a reliable, resilient, and secure cloud-first platform. The SRE role is a software engineering role that requires not only an understanding of how our systems are meant to work, but also the skills needed to adjust when they don’t. Additionally, our SREs take the lead on proactively implementing non-functional requirements which enable us to meet the objectives set out in ourService Level Agreements (SLA).We leverage Amazon AWS as our cloud provider and embrace tools for automation and continuous delivery,including TFS, Octopus Deploy, Terraform, AWS CodeBuild/CodeDeploy, and more.

Overview of Duties and Responsibilities
• Work closely with software development teams to ensure operational success of production
applications
• Identify and develop the automation and software changes needed to address operational issues
• Focus on automation and work on development tasks to reduce manual toil, reduce outages, and
enhance scalability, security, and resiliency
• Work with other SREs to drive standards and consistency around best practices
• Resolve production issues, identify root causes, and iterate on improving both production and preproduction environments.
• Coordinate and troubleshoot complex technical issues with follow-thru to resolution
• Practice sustainable incident response and blameless postmortems
• Drive teams to make cost-efficient use of cloud resources via capacity planning and automated scaling
• Create and maintain operational documentation and runbooks
• Develop, debug, maintain, and apply infrastructure-as-code to deploy cloud resources
• Function as a subject matter expert on infrastructure-as-code (I.e. Terraform)
• Develop fluency with existing systems and infrastructure
• Design, automate, and implement monitoring and metric collection
• Identify opportunities and influence the roadmap of deployment infrastructure standards and tooling
• Follow and advance quality standards
• Influence and participate in design and architecture decisions
• Actively practice Agile behaviors and participate in Scrum team ceremonies
• Work, lead, and learn on a cross-functional Agile team
• Be passionate and continue to advance your craft

Knowledge, Skills, and Attributes
MUST HAVE:
• 5+ years of experience in SRE, DevOps, systems administration, or software engineering with a track record of hands on experience automating application deployment to Amazon AWS
• Hands on experience writing and applying infrastructure-as-code, such as Terraform or CloudFormation
• Hands on experience in planning, designing, deploying, and supporting cloud deployed applications
• Experience working with and configuring monitoring and alert tooling such as New Relic
• Experience with operating systems and a strong understanding of network architectures
• Experience supporting and deploying applications to Windows based servers
• Experience writing and maintaining scripts
• Experience with source control tooling, such as TFS or GIT, in a team environment
• A strong understanding of Internet standards and protocols such as HTTP, DNS, TCP, and UDP
• The ability to develop and deploy concise solutions to complex operational challenges
• An enjoyment of technical challenges and eagerness to explore new approaches
• A willingness to ask for help and the ability to communicate what you need to do your best work

NICE TO HAVE:
• Experience supporting and deploying applications to Linux based servers
• Experience with general purpose programming languages such as C#, JavaScript, Python

Our Culture Supports You To...
• Take initiative by bringing and developing new ideas
• Embrace failure as a crucial step toward success and an opportunity to learn
• Pair program with fellow team members to develop shared patterns and receive/give mentoring in various languages and tools
• Develop personally and professionally with regular training, workshops, conferences, collaborative initiatives, Hackathons, and R&D projects
• Be valued for your new perspective or depth of experience Note: Employee may come in contact with ePHI as part of their job responsibilities and he/she would need to follow appropriate policies and procedures applicable under HIPAA regulations.

EEO Employer

Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at 844-463-6178

Only registered members can apply for jobs.