AI Performance Engineer

Jelentkezés a cégnél

Cég neve

High Tech Engineering Center Kft.
Munkavégzés helye

Országos lefedettség

Munkaidő, foglalkoztatás jellege
- Teljes munkaidő
- Általános munkarend
Elvárt technológiák
- DOCKER PYTHON LINUX
Elvárások
- Angol középfok
- 1-3 év tapasztalat
- Középiskola

Állás elmentve

A hirdetést eltávolítottuk a mentett állásai közül.

Responsibilities

Optimize training and inference pipelines for large language models such as Llama 2, Llama 3, DeepSeek, and GPT-OSS
Work on MLPerf Training and/or Inference benchmarks for LLM workloads
Profile GPU workloads to identify compute, memory, and communication bottlenecks
Improve scaling efficiency across multi-GPU and multi-node setups
Tune distributed training strategies (DDP, FSDP, ZeRO, tensor/pipeline parallelism)
Build and maintain reproducible benchmark environments (Docker / Singularity)
Collaborate with engineers on performance, stability, and scalability improvements
Document findings and contribute to benchmark submissions and internal reports

Requirements

1-2 year of AI engineering knowledge / Deep Learning, GPU, or HPC-related roles
Strong Python skills and solid experience with PyTorch
Hands-on experience with LLM training or inference (Llama, GPT-style models, or similar)
Experience with distributed training (DDP, FSDP, ZeRO, DeepSpeed, or equivalent)
Good understanding of GPU performance fundamentals (compute vs memory, profiling, optimization)
Experience working in Linux-based environments
Familiarity with container technologies (Docker or similar)
Good level of spoken and written English

Nice-to-have

Experience working with MLPerf or other standardized benchmarking frameworks, Exposure to LLM optimization techniques (activation checkpointing, KV-cache optimization, sequence parallelism), Experience with GPU profiling tools (torch.profiler, Nsight, or equivalent), Knowledge of GPU kernel optimization (CUDA, HIP, Triton, or similar), Experience working with job schedulers (Slurm or equivalent), Familiarity with quantization or mixed precision (FP16, BF16, FP8)