Job Title:
Data Engineer

Company: Brooksource

Location: Philadelphia, PA

Created: 2024-04-24

Job Type: Full Time

Job Description:

Data Engineer Philadelphia, PA (Hybrid)Contract to HireAll the relevant skills, qualifications and experience that a successful applicant will need are listed in the following description.A healthcare client of ours in the Philadelphia Center City area is looking for a Data Engineer to join their Data Science team. The Data Engineer will play a crucial role preparing both unstructured and structured data, ensuring it is ready for our cutting-edge Gen AI and ML projects.Responsibilities:Design, implement, and maintain scalable data ingestion processes and pipelines from various sources, including health care databases and APIs for Gen AI and ML applications, focusing on efficient data tokenization, vectorization, and preprocessing. Utilize GCP technologies such as BigQuery for data warehousing, Cloud Storage for data lake solutions, Dataflow for stream and batch data processing, and Vertex AI Pipelines for orchestrating and automating ML workflows. Develop and maintain Jupyter notebooks for exploratory data analysis, prototyping ML models, and visualizing data and model outputs, ensuring seamless integration with GCP services. Prepare data for model training, validation, and testing using data cleaning, transformation, augmentation, and visualization techniques to improve data quality and utility for AI and ML models. Collaborate with AI/ML engineers and data scientists to understand data requirements and implement solutions that support model development and deployment, using Vertex AI to train, host, and manage ML models at scale. Monitor, troubleshoot, and optimize data pipelines for performance and efficiency, using GCP's monitoring tools to ensure high availability and reliability of data processing and ML workflows. Stay current with the latest technologies and trends in data engineering, GCP, AI, and ML, applying best practices to our data infrastructure. Qualifications: Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related field. Minimum of 3 years of experience as a Data Engineer or a similar role, with a strong background in data processing and data pipelines. Familiarity with GCP technologies, including BigQuery, Cloud Storage, Dataflow, Vertex AI Pipelines, and the use of Jupyter notebooks for data analysis and ML model development. Strong programming skills in Python, with a deep understanding of libraries and tools for data manipulation, analysis, and ML (e.g., Pandas, NumPy, Scikit-learn, TensorFlow, PyTorch). Experience with data processing and ETL tools such as Apache Beam, Spark, or similar technologies, and familiarity with workflow orchestration tools like Airflow. Knowledge of health care data and data standards (e.g., HL7, FHIR) and experience dealing with sensitive data and compliance frameworks (HIPAA) is highly desirable. Excellent problem-solving, analytical, and communication skills, with the ability to work collaboratively in a team environment. Self-motivated with a keen interest in staying up to date with technology trends and advancements in the field of data engineering and AI/ML. Eight Eleven Group provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, national origin, age, sex, citizenship, disability, genetic information, sexual orientation, gender identity, marital status, amnesty or status as a covered veteran in accordance with applicable federal, state, and local laws.