The RoleWe're looking for an AI Data Engineer to build and maintain the data infrastructure powering our AI-driven healthcare platform. This role focuses on implementing robust data pipelines, managing our data lakehouse architecture, and ensuring high-quality data processing for our AI systems.Responsibilities:Design and implement scalable data pipelines for diverse healthcare data sourcesBuild and maintain data lakehouse architecture on AWS for storing structured and unstructured medical dataC
The Role
We're looking for an AI Data Engineer to build and maintain the data infrastructure powering our AI-driven healthcare platform. This role focuses on implementing robust data pipelines, managing our data lakehouse architecture, and ensuring high-quality data processing for our AI systems.
Responsibilities:
Ideal Candidate:
- Strong experience with AWS data services (S3, RDS, Glue, EMR Serverless, Athena, DataZone, Lake Formation, DynamoDB)
- Expertise in data orchestration tools (Dagster, Apache Airflow, AWS MWAA, Step Functions)
- Proficiency in Python, SQL, and PySpark with experience in data processing frameworks
- Experience with data lakehouse architectures, ETL pipeline development, and SageMaker Feature Store
- Strong background with AWS analytics services (Glue Catalog, Glue ETL/EMR Serverless, Athena)
- Experience with Apache Iceberg table format for organizing data in data lakehouse architecture, including working with time travel, ACID transactions, and schema evolution
- Experience with PostgreSQL and vector databases (pgvector, OpenSearch, etc.)
- Proficiency in data transformation tools like dbt
- Experience implementing data quality frameworks (Great Expectations, Glue Data Quality, PyDeequ)
- Knowledge of healthcare data structures and medical terminology preferred
- Experience with data preprocessing for LLM applications strongly preferred (NLP libraries like spaCy, web scraping tools, text extraction, semantic chunking, etc.)
- Understanding of data security and HIPAA compliance requirements
- Collaborative mindset and ability to work in a fast-paced startup environment
- Bachelor's degree in Computer Science, Engineering, or related field