Big Data Engineer (NRT/Spark)

Fintech at Yandex is one of the company's key and most dynamically developing areas. It is an ecosystem of financial services embedded in the daily lives of millions of users and businesses. It is a young, bold direction that has already proven its value and continues to grow actively.

Key products of Fintech: Yandex Pay, Split, Saves, Card Plus, credit products, Yandex ID, Yandex Pro, and much more.

This is a complex engineering and analytical environment where high load, big data, and strict regulatory requirements intersect.

The data platform team requires an Apache Spark specialist with a deep understanding of the framework's internal architecture. The primary task is developing the platform for processing NRT data (near real-time), improving the fault tolerance and performance of streaming processes.

Here's what you'll have to deal with: * High loads: millions of transactions, tables with volumes far exceeding 1 million records * Distributed data: Greenplum, ClickHouse, Hadoop, Spark are used to process huge arrays of information * DWH architecture: complex data warehouses, data marts for reporting (regulatory and managerial) are built, ETL processes are configured * Security and integration: developing integrations with external sources (BKI, SMEV), automating back-office manual processes, working with automated banking systems

Tech stack for daily tasks: * Language: Python (main for pipeline and script development) * Orchestration: Apache Airflow (pipeline development) * Data processing: Apache Spark (Spark SQL, DataFrame) * Queries: Trino (working with data through the engine)

Learn more about us on the Fintech page.

What tasks await you

Designing and developing a cloud data platform We are preparing for a manifold increase in the volume of processed data. You will optimize infrastructure and design new platform components using Yandex Cloud technologies.

Building a unified observability platform for DWH tables We ensure observability and controllability of processes. You will create a single entry point for monitoring build statuses, data quality, and dependency analysis, including building datalineage and logging processes.

Accelerating current data delivery processes Our task is to speed up the processing of growing data flows. You will optimize existing and build new reliable data delivery processes, including snapshot-taking processes, direct incremental loads from audit tables and CDC, and implementing Spark Streaming or similar solutions to ensure data updates with minimal latency.

Building test environments We are building a safe development environment and increasing the reliability of developed solutions. Your tasks will include creating isolated development, testing, and pre-production environments with automated verification processes.

Developing data processing frameworks To accelerate development, we are creating low-code solutions. You will develop internal frameworks for data loading, transformation, and quality control.

We expect you to

Have at least four years of commercial development experience
Have a deep understanding of database internals: query optimization, execution plans, partitioning, sharding, and indexes
Be proficient with various DBMS types: PostgreSQL, Oracle, MongoDB, Greenplum
Have experience with CDC (Change Data Capture): understand the principles and be able to implement pipelines
Confidently work with S3-compatible storage

Contacts

What tasks await you

We expect you to

Similar vacancies

Data Engineer GigaData

Data Engineer for Yandex Cloud Data Platform

Senior Data Engineer

Senior Data Engineer

Data Engineer

DATA ENGINEER (SMT)

Data Engineer (Premium Solutions)

Senior Data Engineer

Data Engineer (Scala)

Data Engineer (SafeHub)

Data Engineer in Fintech

Data Engineer (Python) / Python Developer

Big Data Engineer (NRT/Spark)

Key Skills

Details

Average salary for this role