Hi,
Hope you are doing well.
This is contract position, please revert me with updated resume if this JD matches your profile and you are interested,
Title: Site Reliability Engineer/Production Support (Hybrid)
Location: Atlanta, GA (Only Local Candidates)
Duration: 6+ Months
Job Description:
The Production Support Engineer II is responsible for providing day-to-day support for business-critical systems, ensuring operational stability, and quickly resolving incidents. This role focuses on resolving lower to medium-priority incidents, maintaining system health, and supporting the improvement of production environments through collaboration with senior engineers and cross-functional teams.
What You’ll Do (Responsibilities)
- Identify, troubleshoot, and resolve lower to medium-priority technical issues with guidance from senior engineers, ensuring minimal disruption to business operations.
- Support day-to-day monitoring of system performance and use monitoring tools (e.g., Splunk, Dynatrace, CloudWatch) to detect anomalies and take corrective actions.
- Collaborate with cross-functional teams to resolve technical incidents and escalate higher-complexity issues to senior engineers as needed.
- Assist in automating routine production support tasks by developing or modifying scripts and tools.
- Maintain documentation for production issues, troubleshooting steps, and system configurations, contributing to the shared knowledge base.
- Participate in incident, problem, and change management processes, following ITIL best practices.
- Perform root cause analysis for recurring issues and assist senior engineers in implementing permanent fixes to improve system stability.
- Support the implementation of process improvements to enhance system performance and minimize downtime.
- Assist with mentoring and supporting junior-level engineers, providing guidance as needed.
**Other duties may be performed, both major and minor, which are not mentioned above. Specific activities may change from time to time.
Qualifications:
Necessary Qualifications:
The requirements listed below are representative of the knowledge, skill and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.
- Bachelor’s Degree and 4-7 years of experience or equivalent education and software engineering training or experience
- Proficiency in using monitoring tools like Splunk, Dynatrace, or CloudWatch to detect and resolve system performance issues.
- SRE (Site Reliability Engineer) skills
- In-depth knowledge in information systems and ability to identify, apply, and implement IT best practices
- Understanding of key business processes and competitive strategies related to the IT function
- Ability to plan and manage projects and solve complex problems by applying best practices
- Ability to provide direction and mentor less experienced teammates. Ability to interpret and convey complex, difficult, or sensitive information
Preferred Qualifications:
- 4-8 years of experience in production support, systems administration, or related technical roles.
- Experience with IT Service Management (ITSM) tools such as ServiceNow with solid understanding of incident, problem, and change management processes.
- Familiarity with supporting Agile team/processes.
- Experience in automation tools and scripting for production support tasks.
- Banking or financial services experience
- Experience with cloud technologies such as Configuration Management (ex. Terraform), CICD GitLab, Containerization (ex. Kubernetes), etc
- AWS Certified Solutions Architect Associate a plus