We are inviting an MLOps Engineer to work on our clients' IT projects (in an outstaff format).
You will be able to gain experience and unlock your potential by working on our clients' unique, high-tech projects.
You focus on the technical tasks, and we take care of negotiations with the client, solving bureaucratic issues, and ensuring timely payment for the project work.
WE WILL ENTRUST YOU WITH:
- Developing and supporting infrastructure for the development, training, testing, and production deployment of ML models.
- Building and automating the ML lifecycle: from environment preparation and training pipelines to deployment, monitoring, and model retraining.
- Designing and maintaining a production-ready MLOps platform for Data Science / ML Engineering teams.
- Automating the processes for building, testing, delivering, and deploying ML services and models.
- Ensuring reproducibility of ML experiments, versioning of models, artifacts, datasets, and configurations.
- Configuring orchestration for ML pipelines and model training/retraining processes.
- Implementing and maintaining monitoring for ML systems: monitoring model quality, degradation, data drift / concept drift.
- Participating in the integration of ML solutions into the existing IT infrastructure: APIs, message queues, data storage, Kubernetes clusters.
- Optimizing the performance, scalability, and reliability of the ML platform and inference services.
TO PERFORM THE TASKS, THE FOLLOWING IS REQUIRED:
- Higher education (mathematics / physics / computer science / IT).
- Commercial experience in the role of MLOps / DevOps / Platform / Infrastructure Engineer from 5 years, including experience in supporting the ML domain.
- Solid knowledge of Linux at the system engineer level: processes, memory, file systems, systemd, network, diagnostic and debugging tools.
- Hands-on experience with containerization: Docker, understanding of namespaces, cgroups, container networking.
- Solid experience with Kubernetes in production: understanding of cluster component architecture and interaction, understanding of cluster network interaction, experience with Calico and/or Cilium will be a plus.
- Experience automating infrastructure using Ansible.
- Experience building and maintaining CI/CD pipelines in Jenkins and/or GitLab CI.
- Hands-on experience with monitoring, logging, and alerting systems: Prometheus, Grafana, ELK/EFK.
- Good understanding of the TCP/IP network stack, DNS, TLS, reverse proxy, routing, and load balancing.
- Experience operating relational and non-relational databases at the infrastructure level. Experience working with several of the following systems is desirable: PostgreSQL, MySQL, MongoDB, ClickHouse, Cassandra.
- Experience with message exchange systems: Kafka, RabbitMQ, or equivalents.
- Experience supporting and deploying ML models to production.
- Understanding of the full lifecycle of ML solutions: training, validation, packaging, deployment, inference, monitoring, retraining.
- Experience with orchestration tools for ML/data pipelines: Airflow, Kubeflow, MLflow, Argo Workflows, or equivalents.
- Understanding of the principles: versioning of models and artifacts, reproducibility of experiments, managing dependencies of ML environments, rollout/rollback of models, monitoring model quality after deployment to production.
Will be a plus
- Experience with feature store, model registry, experiment tracking systems.
- Hands-on experience with Seldon Core, KServe, BentoML, or similar tools.
- Experience building platform solutions for multiple DS/ML teams.
- Knowledge of Python at a level sufficient for automation, pipeline integration, and supporting ML tooling.
- Experience with GPU infrastructure and running ML workloads in Kubernetes.
- Experience with LLM/GenAI infrastructure, inference services, and vector DB.
- Having relevant certifications in Kubernetes, Cloud, Linux,