Reach out directly about this role
3 years
Experience
Full-time
Employment
AI Engineering
Specialization
IT & Tech
Industry
Corporation
Company Type
LLM Infrastructure Developer
Large-scale LLM inference is a complex infrastructure challenge: GPUs operate at their limits, network delays occur, and hardware failures are possible. We build solutions to minimize the impact of these events on the availability and latency of our inference service.
Optimizing inference engines You will be responsible for improving efficiency and reducing latency during LLM inference on GPUs.
Developing diagnostic tools You will create and enhance tools for quickly identifying and resolving infrastructure issues that affect inference stability and speed.
Research and implementation You will work with inference optimization methods (quantization, pruning) and modern approaches to parallelization.