We are developing our own self-hosted analytical platform and are looking for a Data Engineer to help us maintain and accelerate the growth of our data department.
Our vacancy is at the intersection of Data Engineering and product analytics.
You will be responsible for ensuring that data is:
- consistently collected,
- correctly modeled,
- understandable to business and product teams,
- and actually used for decision-making.
What you will be doing:
- Designing, creating, and maintaining DAGs in Apache Airflow for data collection and processing.
- Writing efficient SQL queries to our storage based on Trino and Iceberg, modeling data marts.
- Developing and supporting dashboards in Apache Superset, answering key business and product questions.
- Participating in query optimization and improving the architecture of our analytical platform.
We will work well together if you:
- Have 1+ year of experience as a Data Engineer or Data Analyst with an engineering focus.
- Confidently write SQL and understand how to make queries fast and maintainable.
- Use Python (Pandas and similar libraries) and have created DAGs in Apache Airflow.
- Have created dashboards in Superset, Metabase, Tableau, Power BI, or similar tools.
- Understand how data models, data marts, and partitioning are structured.
- Enjoy diving into tasks and are not afraid to propose your own solutions.
It will be a big plus if you have:
- Experience with our stack: Trino (TrinoDB/Presto), Apache Iceberg, Apache Airflow, Apache Superset.
- Experience with self-hosted infrastructure (Linux, Docker).
- Interest in the EdTech sector and mobile products.
- Experience in product analytics data analysis (A/B tests, funnels, retention).
Our stack:
- Storage: S3, Apache Iceberg, PostgreSQL.
- Query engine: Trino.
- Orchestration: Apache Airflow.
- Visualization: Apache Superset.
- Infrastructure: Docker, Kubernetes.
What we offer:
- A distributed Russian-speaking team of 40 people. You can work from anywhere in the world.
- Work with a modern and in-demand stack of technologies in the data field.
- Lack of bureaucracy and fast decision-making processes.
- The opportunity to directly influence the development of a product used by millions of people worldwide.