Company: International product IT project (VoIP / Cloud Telephony) Employment: Full-time Format: Hybrid (office in Tashkent for 2–3 months → full remote afterwards) Salary: starting from $3500 to $5000 (discussed individually)

We are a product team creating an intelligent cloud telephony ecosystem for the US and Canadian markets. Our product is a fault-tolerant platform with millions of traffic volume. ML is not a secondary feature for us, but the foundation of the system, operating in real-time. We are looking for an engineer who thoroughly understands the internal architecture of audio models and is ready to be responsible for their operation in a high-load production environment.

What you will do:

Develop the AMD (Answering Machine Detection) system: further training and tuning of models for real-time call classification (distinguishing humans from answering machines/IVRs).
Full-cycle development: from collecting and "dirty" labeling of audio data to deploying and calibrating thresholds in production.
Integration into the Core product: transferring ML components to the backend infrastructure (C# / SIP / RTP stack) via ONNX Runtime.
Latency optimization: fighting for milliseconds in audio streaming conditions.
Deep Analysis: error hunting and breaking down complex edge cases in real call scenarios.
Research (R&D): experiments with noise reduction, VAD, and new architectures for speech processing.

Our stack: Python, C# wav2vec 2.0, Whisper, HuggingFace Transformers MFCC, embeddings, spectrograms ONNX / ONNX Runtime, Quantization SIP / RTP, Windows / Linux

We expect:

2+ years of experience in ML in production (when your model actually worked with users).
Practical experience with Speech/Audio: understanding how audio features and modern sound processing architectures work.
Engineering approach (QA-mindset): you are genuinely interested in "digging into" data anomalies and stress-testing the system.
Understanding of classics and modern approaches: Fine-tuning, Transfer Learning, and the ability to work with metrics (Precision/Recall, ROC-AUC, Calibration).
Ability to work end-to-end: from raw files to optimized inference.

What is important:

Engineering autonomy: we value those who find problems themselves and bring solutions to production.
Background: we highly welcome candidates who transitioned to ML from Backend or QA; code and testing culture are important to us.
Readiness for dynamics: the project is growing, there are many tasks, and they directly impact the business.

Will be a plus:

Experience in the Speech/Audio domain (ASR, VAD, Audio Classification).
Understanding of VoIP specifics and stream data processing.
Experience with MLOps and model monitoring tools.

Conditions:

Mandatory offline onboarding in Tashkent (2-3 months) for product immersion, followed by full remote work.
Real production tasks in an international high-load product.
Opportunity for professional growth and compensation review as tasks become more complex.
Work in a team with strong engineering expertise and no bureaucracy.

Contacts

Similar vacancies

Software Engineer ML (Production / Speech & Audio)

AI Engineer (Agents)

Applied ML Engineer

AI Engineer (Audio)

ML Engineer, Voice AI

ML Developer (Middle/Senior)

Senior Machine Learning Engineer

AI engineer

AI/ML Engineer

ML/AI Engineer

Machine Learning Engineer (TTS)

AI System Developer (Speech Recognition)

ML Engineer (Speech/Audio)

Key Skills

Details

Average salary for this role