We invite you to join our team Lead Data ScientistRole:Build analytics processes so that data is stable, reproducible and controllable; start the training of analysts and ensure the delivery of ML solutions from problem setting to production.Responsibilities:Data engineering & pipelines: setting up Dagster + dbt; data tests, alerts, leakage control.Analytical showcases: design of consistent marts (customer/check/product/store/promo/channel) with correct grain and historicity.ML for tabular data:
We invite you to join our team Lead Data Scientist
Role:
- Build analytics processes so that data is stable, reproducible and controllable;
- start the training of analysts and ensure the delivery of ML solutions from problem setting to production.
Responsibilities:
- Data engineering & pipelines: setting up Dagster + dbt; data tests, alerts, leakage control.
- Analytical showcases: design of consistent marts (customer/check/product/store/promo/channel) with correct grain and historicity.
- ML for tabular data: building and validating models (LightGBM/XGBoost/CatBoost), regularization, CV, class imbalance work, interpretation (SHAP).
- Assessment of model quality: ROC-AUC/PR-AUC, F1, calibration and others.; preparation of metrics and reports for business.
- Full ML/DS cycle: task setting - preparation of datasets - modeling - interpretation - production (batch/API), Docker.
- Training/mentoring: system upskill training for analysts (Excel level and above), regular classes and review of tasks.
- Standards commands: Git, code review, notebook/report templates, documentation; implementation of the "Data Platform Playbook".
- Data mining: finding patterns and hypotheses on real data, working together with business.
- Additionally - Architecture and data platform: participation in the deployment of MinIO + Apache Iceberg + Catalog + Trino; ensuring data quality and manageability.
Requirements (technical):
1. Python + SQL (strong): pandas/numpy, scikit-learn; CTE, window functions, query optimization.
2. Mathematical base (practical):
- probability and statistics: distributions, expectation/variance, confidence intervals, p-value;
- hypothesis testing, A/B tests, statistical power;
- linear algebra: matrices/vectors, basic understanding of gradients.
3. ML for tabular data: LightGBM/XGBoost/CatBoost, regularization, bias-variance, cross-validation, leakage control.
4. Evaluation of models: ROC-AUC/PR-AUC, F1, calibration; work with imbalance; interpretation (SHAP).
5. End-to-end DS: from problem setting to production (batch/API), Docker.
6.Training/mentoring: work with Excel level analysts; system classes + review.
7. Upskill program: ability to design a plan for 3-6 months (practice/homework/skills matrix).
8.Team standards: Git, code review, templates, documentation.
Will be a plus: experience with Lakehouse, Trino performance tuning, production-ML solutions in Retail/FMCG, CI/CD experience for DS.
Tasksfor the pilot (first 6 months):
- Join the “data factory” deployment project (MinIO + Iceberg + Catalog + Trino) — ensure stability, reproducibility, control.
- Build basic data windows for customer analytics (customer/check/product/shop/promo/channel) with agreed grain and historicity.
- Set up automatic pipelines (Dagster + dbt), data tests and alerts.
- Build processes of data processing and analysis, data mining.
Internal training (required):
5. Conduct a SQL Bootcamp for a pilot group (3-4 people): SELECT/JOIN/GROUP BY, window functions, grain logic, rules "how not to break metrics".
6. Create a “Data Platform Playbook”: how to connect, where which tables, what the “source of truth” is, how to request new fields/tables (application process).
7. Run office hours 2 times/week: analysis of real tasks of analysts on real data.
The company offers:
- remote or hybrid format work;
- employment on the terms of a gig contract or in the state (reservation is possible);
- paid annual leave of 24 calendar days, paid sick leave;
- regular payment of wages without delays and in stipulated amounts, regular salary review;
- opportunity for professional and career growth;
- training courses.
Contact person: Kateryna, tel. style="font-weight: 400">0984567857 (t.me/KaterynaB_HR)