Site Reliability Engineer Jobs - Site Reliability Engineer, 15099

at Agile
Location Atlanta, GA
Date Posted April 4, 2019
Category Default
Job Type Contract to Hire

Description

Site Reliability Engineer

Our client is seeking a Site Reliability Engineer to join their team in Atlanta, GA!

Here's what you'll be doing:

  • Engaging in and improving the whole lifecycle of software development services-from inception and design, through deployment, operation, and refinement
  • Supporting services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews
  • Maintaining services once they are live by measuring and monitoring availability, latency, and overall system health in a 24x7 environment
  • Scaling systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity
  • Practicing sustainable incident response and blameless postmortems
  • Influencing and creating new designs, architecture, standards, and methods for large-scale systems
  • Binding and orchestrating the system infrastructure with the application layer to enable High Availability/Clustering load balancing and integration
  • Providing technical guidance or support for the development or troubleshooting of systems
  • Responsible for establishing end-to-end monitoring and alerting on all critical aspects to ensure SLOs, SLIs, and SLAs and get proactive notifications of possible issues for all systems
  • Developing automated solutions to address potential problems before they result in a service interruption and demonstrating a passion for automation, including CI/CD automation
  • Establishing performance baseline, capacity thresholds, correlate events, and define monitoring/alerting criteria

Here's what our ideal candidate has:

  • Bachelors of Science degree in Computer Science, Engineering, or equivalent relevant experience
  • Expertise in designing, analyzing and troubleshooting large-scale distributed systems
  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
  • Ability to debug and optimize code and automate routine tasks
  • Overall 6+ years of experience in one or more of the following
  • Experience in building JavaEE applications using, build tools like Maven/ANT, Subversion, JIRA Jenkins, Bitbucket and Chef
  • Experience in continuous integration tools (Jenkins, SonarQube, JIRA, Nexus, Confluence, GIT-BitBucket, Maven, Gradle, RunDeck, is a plus)
  • Experience creating automation using Chef, Puppet or another SCM tool; Docker and container scheduler services such as ECS or Kubernetes is desirable
  • Experience working with Nginx, Tomcat, HAProxy, Redis, Elastic Search, MongoDB, and RabbitMQ, Kafka, Zookeeper
  • Experience as SCM/release engineer, or in a position with similar skill sets and responsibilities (Software Engineer, Systems Engineer, Systems Administrator)
  • Experience in performing source code control management Subversion/GIT including branching, merging, tagging, etc.
  • Experience in configuring and administering JavaEE application servers (Tomcat, WebSphere, WebLogic, etc.)
  • Experience in with scripting language such as Unix Shells, Python, Perl, Shell, bash, ksh)
  • Experience in configuring, building, and supporting apps and operations in a public cloud environment (AWS, Azure, GCP)
  • Experience with Monitoring and Logging tools (Elastic Search, ELK, AppDynamics, Splunk, etc.)
  • Knowledge of Agile / Scrum methodologies and principles

Benefits: Our IT consultants enjoy a wide array of benefits including: medical, dental, 401K, life insurance, and much more.