Company: International product IT project (VoIP / Cloud Telephony)
Employment: Full-time
Format: Hybrid (office in Tashkent for 2–3 months → full remote afterwards)
Salary: starting from $3500 to $5000 (discussed individually)
We are a product team creating an intelligent cloud telephony ecosystem for the US and Canadian markets. Our product is a fault-tolerant platform with millions of traffic volume. ML is not a secondary feature for us, but the foundation of the system, operating in real-time. We are looking for an engineer who thoroughly understands the internal architecture of audio models and is ready to be responsible for their operation in a high-load production environment.
What you will do:
- Develop the AMD (Answering Machine Detection) system: further training and tuning of models for real-time call classification (distinguishing humans from answering machines/IVRs).
- Full-cycle development: from collecting and "dirty" labeling of audio data to deploying and calibrating thresholds in production.
- Integration into the Core product: transferring ML components to the backend infrastructure (C# / SIP / RTP stack) via ONNX Runtime.
- Latency optimization: fighting for milliseconds in audio streaming conditions.
- Deep Analysis: error hunting and breaking down complex edge cases in real call scenarios.
- Research (R&D): experiments with noise reduction, VAD, and new architectures for speech processing.
Our stack: Python, C# wav2vec 2.0, Whisper, HuggingFace Transformers MFCC, embeddings, spectrograms ONNX / ONNX Runtime, Quantization SIP / RTP, Windows / Linux
We expect:
- 2+ years of experience in ML in production (when your model actually worked with users).
- Practical experience with Speech/Audio: understanding how audio features and modern sound processing architectures work.
- Engineering approach (QA-mindset): you are genuinely interested in "digging into" data anomalies and stress-testing the system.
- Understanding of classics and modern approaches: Fine-tuning, Transfer Learning, and the ability to work with metrics (Precision/Recall, ROC-AUC, Calibration).
- Ability to work end-to-end: from raw files to optimized inference.
What is important:
- Engineering autonomy: we value those who find problems themselves and bring solutions to production.
- Background: we highly welcome candidates who transitioned to ML from Backend or QA; code and testing culture are important to us.
- Readiness for dynamics: the project is growing, there are many tasks, and they directly impact the business.
Will be a plus:
- Experience in the Speech/Audio domain (ASR, VAD, Audio Classification).
- Understanding of VoIP specifics and stream data processing.
- Experience with MLOps and model monitoring tools.
Conditions:
- Mandatory offline onboarding in Tashkent (2-3 months) for product immersion, followed by full remote work.
- Real production tasks in an international high-load product.
- Opportunity for professional growth and compensation review as tasks become more complex.
- Work in a team with strong engineering expertise and no bureaucracy.