Description
We are a team of specialists creating services and components for an internal security platform, and developing AI agents to identify internal threats and fraud. We need a specialist capable of effectively analyzing tabular and textual data required for building AI agents.
We are looking for a Data Engineer with a focus on Data Science, who will be responsible for extracting, preparing, and cleaning data, as well as supporting machine learning models. We offer career growth in both Data Engineering and Data Science directions. You will become part of a team of experienced engineers and analysts working on complex projects in the field of information security.
Responsibilities
Tasks:
- analysis of data structures in various sources and formats, assessing their suitability for specific business tasks
- loading, processing, and transforming large volumes of data from heterogeneous storage systems (Oracle, Teradata, MS SQL, GreenPlum) into working environments (GreenPlum, Hadoop)
- designing and creating analytical data marts
- preparing and preprocessing data for training machine learning models
- monitoring and optimizing data processing and loading workflows
- controlling input data quality and automating data quality checks
- developing infrastructure and internal services for efficient processing of large volumes of data
- automating repetitive data operations
- creating technical documentation and maintaining knowledge bases on data handling
- consulting internal company users on data usage matters.
Requirements
Requirements
- higher education
- at least 2 years of experience in the role of Data Engineer, Data Analyst, or ETL developer
- advanced SQL proficiency (analytical functions, subqueries, stored procedures, query performance)
- practical experience working with large volumes of data in relational DBMS (Oracle, Teradata, MS SQL, GreenPlum)
- understanding of the concept and principles of data warehouse (DWH) organization
- experience with the Hadoop technology stack (HDFS, YARN, Hive) and Apache Spark
- programming experience in Java/Scala
- understanding of the basic principles of building distributed data storage and processing systems.
Will be a plus:
- experience in designing data marts
- experience in migrating and integrating large volumes of data between different sources
- proficiency with version control system tools (e.g., Git)
- basic knowledge and interest in developing in the field of Machine Learning and Data Analysis
- familiarity with ETL processes and data warehouse (DWH) technologies.
Conditions
We offer:
- work in the office at the address: Moscow, Kutuzovsky Prospekt, 32
- during the probationary period, work in the office, after that, a hybrid work mode is possible (but no more than 1-2 days a week remotely)
- annual salary review, annual bonus
- corporate gym and relaxation areas
- more than 400 educational programs from SberUniversity for professional and career development
- extended voluntary health insurance, preferential family insurance, and corporate pension program
- flexible mortgage discount equal to 1/3 of the Central Bank's key rate
- free SberPrime+ subscription, discounts on products from partner companies
- referral bonus for recommending friends to the Sber team.