Job Description:

Job Title: Senior Site Reliability Engineer IReports to: Senior Manager of Site Reliability EngineeringJob Location: San Diego, CA, USAJob Status: Exempt, FTAbout SHEINSHEIN is a global fashion and lifestyle e-retailer committed to making the beauty of fashion accessible to all. We use on-demand manufacturing technology to connect suppliers to our agile supply chain, reducing inventory waste and enabling us to deliver a variety of affordable products to customers around the world. From our global offices, we reach customers in more than 150 countries.Founded in 2012, SHEIN has nearly 10,000 employees operating from offices around the world, with U.S. Headquarters located in Los Angeles and Global Headquarters located in Singapore. In SHEIN, we work with outstanding, creative, and capable peers. We share an energetic and open culture for capable people to discern, work and ignite as a team.Position SummaryWe are looking for a Senior Site Reliability Engineer - Big Data (Official Title: Senior Site Reliability Engineer I) for our San Diego, CA-based office hub. Site Reliability Engineers work with the Technical Operations team at SHEIN and are hybrid softwaresystems engineers, whose overarching goal is to ensure that Production Services are "Always On." They strive to build the most reliable and performant systems on the planet.SREs work closely cross-functional teams to ensure we have the right set of tools to generate, collect, analyze, visualize and alert on operational data, so we know exactly what happens across the ecosystem and can see problems before they occur and address them as quickly as possible.They are also responsible for improving Operational Efficiency, Utilization and System Resiliency of the Platform. They own Critical Open-Source Software that our platform relies on and are core participants in every significant engineering effort underway in the platform.They are also tasked with driving forward the operability of the platform to drive down the number of incidents while reducing MTTR. To accomplish this, the team combines software development, networking and systems engineering expertise, and a strong desire to be challenged by problems of scale and complexity to make our service better for our customers.Job ResponsibilitiesParticipate in an on-call rotation to ensure 247365 availability of SHEIN's production systemSupervise capacity & utilization and work closely with cross-functional teams to orchestrate scale-updown of the servicesOwn & operate critical open-source services like Elasticsearch, Kafka, RabbitMQ, RedisBuild tools and design processes that help improve observability and system resiliency of the platformTriage Site Availability Incidents and proactively work towards reducing MTTR for customer impacting incidentsPartner with Service owners to implement Service Level Metrics & Service Level Objectives that act as service level health indicatorsEstablish design patterns for monitoring, benchmarking and deploying new features for the backend servicesDevelop and maintain technical documentation, network diagrams, runbooks, and proceduresDriving initiatives to evolve our current platform to increase efficiency and keep it in line with current standards and best practicesResponding to production incidents and using your experience in software development, systems engineering, and networking to proactively prevent repeatable issuesProvide relief and sustainable resolution to issues within our infrastructureDrive initiatives with partner teams to improve the reliability and performance of the infrastructure through improved system design.Join a culture of intolerance to manual activity which results in a highly automated environment delivering scalable solutions.Drive efficiencies through software improvement and root cause analysis resulting in service delivery, maturity, and scalability.Job RequirementsBachelor's degree in Computer Science, Information Systems, or equivalent technical discipline is preferredExperience with Big Data related component operation and maintenance, including Hadoop, Yarn, HBase, Hive, Spark, etc., is highly preferred Experience with OSS technologies, like Elasticsearch, Kafka, and Redis, is highly preferredSolid understanding of Linux system is preferredMinimum 3 years working experience in an enterprise 247 production environment supporting mission-critical, real-time, high-traffic applications, especially in cloud environments is preferredSystematic problem-solving approach, combined with a sense of ownership and driveFull-stack debugging and performance optimization ability, including knowledge of Cloud systems (load balancing, caching, content distribution, etc.), continuous integrationbuild systems, Java, SQL and NoSQL databasesTrack record monitoring and analyzing system performance, isolating issues or bottlenecks that could impact reliability, performance and scalabilityStrong experience with observability tools such as Grafana, Prometheus, Zabbix etcGood experience in any of the scriptingprogramming languages: Python, GoLang etcFamiliar with container technology, such as: Docker, Kubernetes, Mesos, etc.Understanding and experience with SRE concepts and practices, including being an advocate for the elimination of toil and drive simple solutionsGood verbal and written communication skills, and be able to work effectively with geographically remote teamsPay$107,600.00 min - $180,200.00 max annually, Bonus & RSU offered.Benefits and CultureHealthcare (medical, dental, vision, prescription drugs)Health Savings Account with Employer FundingFlexible Spending Accounts (Healthcare and Dependent care)Company-Paid Basic LifeAD&D insuranceCompany-Paid Short-Term and Long-Term DisabilityVoluntary Benefit Offerings (Voluntary LifeAD&D, Hospital Indemnity, Critical Illness, and Accident)Employee Assistance ProgramBusiness Travel Accident Insurance401(k) savings plan with discretionary company match and access to a financial advisorVacation, Paid holidays and sick daysEmployee DiscountsPerks (HQ Location)Free weekly catered lunch at HQDog-Friendly officeFree Gym Access at HQFree Swag GiveawaysAnnual Holiday PartyInvitations to pop-ups and other company eventsComplimentary daily office snacks and beveragesFree Shuttle Service from HQ to LA Union StationSHEIN Distribution is an equal opportunity employer committed to a diverse workplace environment.