Job Title:
Site Reliability Engineer

Company: Azimuth Corporation

Location: springfield, MA

Created: 2024-05-12

Job Type: Full Time

Job Description:

Azimuth Corporation is seeking a qualified Site Reliability Engineer to support an NGA Research customer on-site in the Springfield, VA area. The ideal candidate will provide expert support to the Labs and Data pod. This team works to maximize the mission utility of increasingly diverse visual data sources with automated and scalable computational methods. It employs rigorous testing and evaluation for optimal alignment of data, technology and tradecraft by monitoring and instrumentation: implementing metrics in Prometheus, Grafana, log management and related system, and SlackPagerDuty integrations. Engineering practices: availability, reliability and scalability, as well as disaster recovery. Use and contribute to code to GitLab.ExecutionIdentifies significant projects that result in substantial improvements in reliability, cost savings andor revenue.Identifies changes for the product architecture from the reliability, performance and availability perspectives with a data driven fluences the product roadmap and works with engineering and product counterparts to influence improved resiliency and reliability of the GitLab product.Proactively work on the efficiency and capacity planning to set clear requirements and reduce the system resources usage to make GitLab cheaper to run for all our customers.Identify parts of the system that do not scale, provides immediate palliative measures and drives long term resolution of these incidents.Identify Service Level Indicators (SLIs) that will align the team to meet the availability and latency objectives.Requirements for this role include, but are not limited to, the following: Role: Owns RLEWAN & HARLEM availability, feature performance, and their deploymentsTechnicalGeneral knowledge of most technical expertise areas, with deep knowledge in 2.Advanced Chef (syntax, recipes, cookbooks) and Ansible (syntax, tasks, playbooks)Advanced Terraform syntax and GitLab CICD configuration, pipelines, jobsAdvanced knowledge of cloud servicesKubernetes: cluster provisioning and new servicesPrometheus, Thanos, and Grafana: service catalog metrics and recording rules for alertsLog shipping pipelines and incident debugging visualizationsOperating system (Linux) configuration, package management, startup and troubleshootingBlock and object storage configuration and debuggingWorking knowledge of overall GitLab Product, including deep knowledge of groups which may be part of stable counterpart assignments.Contributes improvements to the GitLab codebase to resolve issues.Must haves: submitted 1 approved swap submitted 1 approved SIDR submitted 2 service+ tickets Minimally worked 1 UC2SSC2S and 1 TC2S FCR from beginning to end Minimally worked 1 ATO process with RLE Security, DAOR, and SCA Minimally 3 years linux (centosredhat 7) O.S. administration Minimally 2 years baremetal admin Minimally 1 years git cli Experience with Gitlab, and can use Gitlab web ui for issues, code review Experience with Ansible, ThreadFix, Terraform and Docker. can use threadfix web ui for vuln finding workoff can use gitlab ci pipelines 1 yr ansible 1 yr terraform 1 yr docker engine installadmin 1 yr rle hpc cli tools Good to know one of the referenced languages: Python, Java, CC++, Ruby, Shell and JavaScript.Experience with distributed storage technologies such as NFS, HDFS, Ceph, and Amazon S3, as well as dynamic resource management frameworks (Apache Mesos, Kubernetes, Yarn)Previous success in technical engineeringCoding experience beyond simple scriptsaccessprivsSCI clearancePRGMADMIN to U|S|T|C2S rle accountssbusecnetcoe accessrle-u|sc|tc admin accessrle-u|sc|tc HPC admin accessNCE data center access Responsibilities for this role include, but are not limited to, the following: Implement and maintain RLEWAN Monitoring App plugins monitoring HPC and other system on prem resourcesCoordinate and integration with broader NGA Enterprise continual monitoring servicesimplement and maintain HARLEM, MIQ, RLEWAN Monitoring App on-prem components as designated by the Product OwnerWork with Security Engineer to provide and resolve on-premises related security relevant technical details during ATO of HARLEM, RLEWAN Monitoring Application, Kubernetes and their sub componentsInteract with team using designated GEOINT Services tooling as designated by the Product OwnerRecord status updates daily in designated GEOINT Services tooling designated by the Product OwnerSustainimprove existing HARLEM, MIQ, RLEWAN Monitoring App CICD pipelinesuse existing UC2S CICD pipelines as a guide to implementing like configuration on other domainsenvironments as designated by the Product OwnerAttend weekly team standup and monthly briefing with team leadershipEnsure all work follows; with and provide requested status to Task Coordinator Requirements:Must be able to work in an office environment at a desk and pany OverviewAzimuth is an award-winning Woman Owned Small Business specializing in providing research and development and professional services support to the federal government. Azimuth's agility, customer driven approach and our commitment to our employees allows us to meet and exceed our client goals. Excellence, Integrity, Accountability, Community and Humility are the core values of Azimuth as we continue to strive as a recognized leader in the management consulting community that both federal agencies and industry partners value doing business with. We are an organization that offers both our employees and clients an exceptional experience; our culture will be contagious, while always maintaining a genuine reputation. Disclaimer The above information on this description has been designed to indicate the general nature and level of work performed by employees within this classification. It is not designed to contain or be interpreted as a comprehensive inventory of all duties, responsibilities, and qualifications required of employees assigned to this job. Azimuth Corporation does not discriminate in employment on the basis of race, color, religion, sex (including pregnancy and gender identity), national origin, political affiliation, sexual orientation, marital status, disability, genetic information, age, membership in an employee organization, retaliation, parental status, military service, or other non-merit factor.