Reach out directly about this role
Big Data Engineer (NRT/Spark)
Fintech at Yandex is one of the company's key and most dynamically developing areas. It is an ecosystem of financial services embedded in the daily lives of millions of users and businesses. It is a young, bold direction that has already proven its value and continues to grow actively.
Key products of Fintech: Yandex Pay, Split, Saves, Card Plus, credit products, Yandex ID, Yandex Pro, and much more.
This is a complex engineering and analytical environment where high load, big data, and strict regulatory requirements intersect.
The data platform team requires an Apache Spark specialist with a deep understanding of the framework's internal architecture. The primary task is developing the platform for processing NRT data (near real-time), improving the fault tolerance and performance of streaming processes.
Here's what you'll have to deal with: * High loads: millions of transactions, tables with volumes far exceeding 1 million records * Distributed data: Greenplum, ClickHouse, Hadoop, Spark are used to process huge arrays of information * DWH architecture: complex data warehouses, data marts for reporting (regulatory and managerial) are built, ETL processes are configured * Security and integration: developing integrations with external sources (BKI, SMEV), automating back-office manual processes, working with automated banking systems
Tech stack for daily tasks: * Language: Python (main for pipeline and script development) * Orchestration: Apache Airflow (pipeline development) * Data processing: Apache Spark (Spark SQL, DataFrame) * Queries: Trino (working with data through the engine)
Learn more about us on the Fintech page.
Designing and developing a cloud data platform We are preparing for a manifold increase in the volume of processed data. You will optimize infrastructure and design new platform components using Yandex Cloud technologies.
Building a unified observability platform for DWH tables We ensure observability and controllability of processes. You will create a single entry point for monitoring build statuses, data quality, and dependency analysis, including building datalineage and logging processes.
Accelerating current data delivery processes Our task is to speed up the processing of growing data flows. You will optimize existing and build new reliable data delivery processes, including snapshot-taking processes, direct incremental loads from audit tables and CDC, and implementing Spark Streaming or similar solutions to ensure data updates with minimal latency.
Building test environments We are building a safe development environment and increasing the reliability of developed solutions. Your tasks will include creating isolated development, testing, and pre-production environments with automated verification processes.
Developing data processing frameworks To accelerate development, we are creating low-code solutions. You will develop internal frameworks for data loading, transformation, and quality control.
3-5 years
Experience
Full-time
Employment
Hybrid, Onsite
Work Format
Senior
Grade
Data Engineering
Specialization
FinTech
Industry
Corporation
Company Type
By city
Data Engineering
Specialization
FinTech
Industry
Corporation
Company Type