Állás részletei
-
Cég neve
High Tech Engineering Center Kft.
-
Munkavégzés helye
Budapest, Budapest -
Munkaidő, foglalkoztatás jellege
- Alkalmazotti jogviszony
- Általános munkarend
-
Elvárt technológiák
- PYTHON DEBUGGING TESTING C C++ LINUX HARDWARE
-
Elvárások
- Nem kell nyelvtudás
- 1-3 év tapasztalat
- Egyetem
Állás elmentve
A hirdetést eltávolítottuk a mentett állásai közül.
Állás leírása
Responsibilities
Build and optimize inference pipelines for large-scale model serving (LLMs and beyond)
Work with frameworks like PyTorch, TensorRT, and vLLM to deploy models efficiently
Implement and optimize ML models using techniques such as quantization (INT8/FP8), kernel fusion, and efficient batching
Optimize and implement core ML operators (e.g., GEMMs, convolutions, activations, ...)
Investigate and resolve issues through system-level debugging and performance analysis
Define and apply practices for testing, deployment, and scaling AI systems
Work with frameworks like PyTorch, TensorRT, and vLLM to deploy models efficiently
Implement and optimize ML models using techniques such as quantization (INT8/FP8), kernel fusion, and efficient batching
Optimize and implement core ML operators (e.g., GEMMs, convolutions, activations, ...)
Investigate and resolve issues through system-level debugging and performance analysis
Define and apply practices for testing, deployment, and scaling AI systems
Requirements
BSc/MSc in Computer Science, Engineering, Mathematics, or related discipline
Strong programming skills in C/C++ or Python in Linux environments using common development tools
Solid knowledge of computer architecture, system software, data structures
Hands-on experience implementing algorithms in high-level languages (C/C++/Python)
Exposure to specialized hardware (GPUs, FPGAs, DSPs, AI accelerators) and frameworks such as OpenCL or CUDA
Experience designing or working with high-performance software systems
Solid knowledge of ML fundamentals
Motivated team player with a strong sense of responsibility
Experience in at least one of the following areas: Model serving frameworks (e.g., Triton Inference Server, DeepSpeed Inference, vLLM), ML runtimes (e.g., ONNX Runtime, TVM, IREE, XLA), Deploying ML workloads (LLMs, VLMs, NLP, etc.) across distributed systems, Implement and optimize ML operators and kernels with a focus on vectorization and efficient execution (e.g., activation, pooling, quantization), Hardware-aware optimizations and performance tuning
2+ years of experience developing software targeting AI hardware
Contribution to open-source projects (e.g., LLVM/MLIR, PyTorch, TensorFlow, ONNX Runtime, xDSL, IREE) is a big plus.
Strong programming skills in C/C++ or Python in Linux environments using common development tools
Solid knowledge of computer architecture, system software, data structures
Hands-on experience implementing algorithms in high-level languages (C/C++/Python)
Exposure to specialized hardware (GPUs, FPGAs, DSPs, AI accelerators) and frameworks such as OpenCL or CUDA
Experience designing or working with high-performance software systems
Solid knowledge of ML fundamentals
Motivated team player with a strong sense of responsibility
Experience in at least one of the following areas: Model serving frameworks (e.g., Triton Inference Server, DeepSpeed Inference, vLLM), ML runtimes (e.g., ONNX Runtime, TVM, IREE, XLA), Deploying ML workloads (LLMs, VLMs, NLP, etc.) across distributed systems, Implement and optimize ML operators and kernels with a focus on vectorization and efficient execution (e.g., activation, pooling, quantization), Hardware-aware optimizations and performance tuning
2+ years of experience developing software targeting AI hardware
Contribution to open-source projects (e.g., LLVM/MLIR, PyTorch, TensorFlow, ONNX Runtime, xDSL, IREE) is a big plus.
How to apply
You can submit your application on the company's website, which you can access by clicking the „Apply on company page“ button.
Állás, munka területe(i)
Álláshirdetés jelentése