Job Title:
System Reliability Engineer - Application Support

Company: Fulcrum Digital Inc

Location: Saint Louis, MO

Created: 2024-04-24

Job Type: Full Time

Job Description:

Who are weFulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries, including banking & financial services, insurance, retail, higher education, food, health care, and manufacturing.Is this the next step in your career Find out if you are the right candidate by reading through the complete overview below.The Role Provide L2 support to production systems like applications, databases, middleware components, infrastructure, and network components Manage production incidents end-to-end within defined SLAs focusing on resolution rather than who caused it. Interact with various stakeholders such as Release managers, program leads, service managers, development and test leads Review operational readiness requirements such as monitoring and alerting, log rotation and resilience of the components, and report the gaps Provide pre-implementation support with activities such as release notes review and implementation dry runs. Protect production components by running health checks, and monitoring latency and memory utilization. Automate day-to-day activities and propose changes that improve reliability Participate in CAB and provide feedback on change requests Support the DevOps team in testing the promoted pipelines and suggest automation of configuration items. Practice incident management best practices and perform RCA. Participate in disaster recovery tests and operational acceptance tests Analyze the technology stack that makes up the product and optimize the recovery time objective. Work with team members spread across time zones Share knowledge, document improvements, and mentor junior resources RequirementsDeployments MTF/Prod Maintenance items (including stop/start, Disaster Recovery-related activities, etc.) Monitoring Support TRTs Incident creation CR for changes in MTF/ProdSkills Linux & Shell ScriptingITIL / ITSM PL/SQL SQLApplication TroubleshootingTicketing incident/problem management tool - RemedyMonitoring Tool - Splunk (preferred), Dynatrace (preferred), or any other monitoring toolJenkins- CI/CD - good to haveGroovy - good to have Any Cloud - AWS / Azure / PCF - good to have Git basic/bit bucket - good to have Even Framework architecture - good to have Ansible/Chef - good to haveDev-ops Basics - CI-CD Basics, Overview of git, Bit bucket, SonarQube, Fortify, CI(Jenkins), ARA, Saltstack, Chef, Artifactory, MC DevOps Toolchain