Site Reliability Engineer Jobs - SRE / DevOps Engineer, 15742

at TEKsystems, Inc
Location Ann Arbor, MI
Date Posted July 11, 2019
Category Default
Job Type Full-time


We are looking for a Site Reliability/DevOps Engineer that will be responsible for designing and developing fully automated deployment solutions, as we head down the path to continuous deployment. This position will serve as an infrastructure and operations engineer within the Next Generation Store Systems Department. This role possesses a mix of development, networking, security, and system administration skills, as this Site Reliability Engineer is required to provide developer support, application systems administration, and production support. In addition to building full-scale environments and deploying full application solutions on demand, you will have the pleasure of implementing creative monitoring solutions and providing full visibility into all areas of our system. Continuous Innovation and Continuous Improvement is key to succeeding in this role. You'll have an integral part in helping the Next Generation Pulse become scalable to address our future growth. The specific focus for the Engineer is on establishing best practices around configuration, automation, and optimization of the development, test and release processes for the Next Generation Pulse Platform. This role works collaboratively with the Agile Delivery Teams to deploy and operate our systems, automate and streamline our operations and processes, build and maintain tools for deployment, monitoring and operations, and to troubleshoot and resolve issues in our production and non-production environments

(40%) System Administration
Engineer extensive scripting and automation to enable applications to install and run in all environments with minimal manual intervention
Evaluate, test, deploy and maintain both custom developed and third-party software upgrades
Maintain SDLC systems such as test environments, source control and automated build/test/deploy systems
Provide developer support on an ongoing basis, frequently embedded in development teams to facilitate collaboration
Create & maintain application architecture and troubleshooting documentation

(30%) Web Production Support
Provide 24x7 production support as part of a team rotation, resolving or escalating issues as appropriate
Maintain production services to highly demanding SLA's
Take ownership of production issues, working closely with the infrastructure and development teams on issue resolution
Support releases on a regularly scheduled basis, as well as emergency releases as needed
Deploy application and data changes to all environments as needed
Provide Level 2 technical support.

(30%) Planning, Design and Implementation
Design and implement new environments, services and application architecture modifications.
Experience with Infrastructure as a Code and/or Configuration Management tools (Puppet, Ansible, Terraform).
Design and implement build, deployment and configuration management.
Manage CI and CD tools.
Handle code deployments.
Monitor metrics and develop ways to improve
Brainstorm for new ideas and ways to improve development delivery.
Research, evaluate and implement operational improvements, application packages and architectural modifications
Participate in change control, release planning, and other operational planning
Remain current on industry-leading solutions in both private and public cloud hosting (VMWare, Xen, KVM, Amazon Web Services (AWS), Azure, Google App Engine, etc.)
Remain current on modern open-source persistence technologies (Hazelcast, BDB, Project Voldemort, Cassandra, MEMCACHED, etc.)
Remain current on modern containerization technologies (Docker, vSphere Integrated Containers, Kubernetes)

Bachelor's degree in computer science or equivalent experience
Release automation (e.g. Jenkins), system administration, system configuration, and system debugging experience.
Experience with configuring and maintaining Jenkins and Jenkins Pipelines.
Experience with Linux and Windows Administration.
5+ years production application support experience in a high uptime environment
5+ years UNIX/Linux administration experience including diagnosis of performance issues, package management, load estimation, kernel tuning, networking configuration, etc.
5+ years hosting experience in a large heavy-traffic environment
4+ years software engineering experience (Java, C, C++)
Strong scripting skill, and has experience in CI/CD and automaton environment
Understand or worked with Container technologies such as PKS, OpenShift, Mesosphere, Docker, and Kubernetes
Understanding of networking principles, esp. TCP/IP
Excellent troubleshooting and analytic skills
Excellent Soft Skills (understanding people, culture; communication skills)
Broad understanding of modern containerization technologies (Docker, VMWare PKS Kubernetes).
Ability to work independently on large, complex projects with minimal guidance
Knowledge of cloud infrastructure environments (e.g. AWS, Azure).

About TEKsystems:

We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.

The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

If you would like to request a reasonable accommodation, such as the modification or adjustment of the job application process or interviewing process due to a disability, please call 888 472-3411 or email accommodation@teksystems .com for other accommodation options.