Staff Engineer - Model Accuracy Development and Test

Q: Is Staff Engineer - Model Accuracy Development and Test at Qualcomm India Private Limited still open?

Yes, currently active and accepting applications.

Qualcomm India Private Limited

Chennai, Tamil NaduSalary not disclosed6–8 years expDay ShiftPosted 3d ago8 views

Actively Hiring

Before you apply — will your resume pass the ATS?

Most IT resumes get rejected by ATS before a recruiter sees them.

Check My Resume Free

Apply for this Job

Before you apply — will your resume pass the ATS?

Most IT resumes get rejected by ATS before a recruiter sees them.

Check My Resume Free

Apply on Company Website

Job Description

Role Overview Qualcomm India Private Limited is hiring a Staff Engineer specialising in Inference Accuracy for its Engineering Group. This role is focused on designing, developing, and validating deep learning model accuracy deployed across large-scale data-centre hardware platforms. You will work at the intersection of AI model evaluation, quantization, and inference pipeline engineering to ensure accurate and reliable model outputs at scale. Key Responsibilities - Define and implement accuracy KPIs across multiple precision modes including FP32, FP16, and INT8 - Develop scalable Python-based accuracy evaluation tools and fully automated testing pipelines - Implement accuracy-preserving optimizations across inference frameworks such as TensorRT, ONNX Runtime, AITemplate, and Triton - Build and maintain automated accuracy evaluation pipelines across ONNX, TensorFlow, and PyTorch frameworks - Develop reusable plugins for preprocessing, post-processing, and metric evaluation workflows - Execute comprehensive accuracy tests for large-scale models including LLMs, vision models, and diffusion models - Validate model accuracy under various quantization and mixed-precision settings - Perform deep architectural accuracy analysis covering layers, attention mechanisms, and parameter configurations - Identify and debug issues related to preprocessing drift, tokenization mismatches, operator fallbacks, and quantization effects - Analyze accuracy differences across hardware targets, firmware versions, and runtime backends - Perform slice-based accuracy analysis across batch sizes, concurrency levels, sequence lengths, and domain shifts - Design and run accuracy recovery experiments including fine-tuning, calibration, and hyperparameter adjustments - Document workflows, maintain dashboards, and publish accuracy results for cross-functional stakeholders Required Qualifications - Bachelor's, Master's, or PhD degree in Engineering, Computer Science, Machine Learning, AI, or a related field - 4 to 10 years of software engineering or related work experience depending on degree level - Minimum 2 years of experience in software or system test engineering including test plan development and automation - Strong background in AI and ML model evaluation, accuracy metrics, and statistical analysis - Solid understanding of model architectures including transformers, CNNs, RNNs, and Mixture of Experts - Hands-on experience with inference runtimes such as TensorRT, ONNX Runtime, and Triton - Deep expertise in quantization techniques including INT8, FP8, INT4, calibration, and QAT workflows - Proficiency in Python and familiarity with ML toolkits including TensorFlow and PyTorch - Experience with data-centre accelerators such as NVIDIA A100, H100, B200, AI100 Ultra, Gaudi, and TPU - Familiarity with distributed deployment systems including Kubernetes and cloud inference services - Knowledge of LLM accuracy evaluation tools such as lm-eval and HELM is an added advantage Why Join Us Qualcomm is a global semiconductor and wireless technology leader offering world-class engineering challenges and an inclusive work environment committed to innovation. Joining this team means working on cutting-edge AI inference systems that power next-generation data-centre and AI platforms at a global scale.

Requirements

- Strong AI and ML model evaluation and accuracy metrics expertise - Hands-on experience with TensorRT, ONNX Runtime, and Triton inference runtimes - Proficiency in Python for automation and pipeline development - Experience with LLMs, generative AI, and large-scale model accuracy validation - Deep knowledge of quantization techniques including INT8, FP8, INT4, QAT, and mixed-precision workflows - Experience with model graph conversion from PyTorch to ONNX to backend engines - Understanding of transformer, CNN, RNN, and MoE model architectures - Experience with data-centre accelerators such as NVIDIA A100, H100, B200, Gaudi, and TPU - Statistical analysis and visualization skills for accuracy debugging - Familiarity with Kubernetes and cloud-based distributed inference systems - 4 to 10 years of software engineering or software test engineering experience

Benefits

- Opportunity to work on cutting-edge AI and deep learning inference systems - Inclusive and accessible workplace with equal opportunity employment - Global exposure working with world-class data-centre hardware platforms - Collaborative engineering environment within a leading semiconductor company