We are now looking for Compute/DL Architecture Performance Optimization Interns in our group!Are you passionate about exploring computer architectures for deep learning? Do you like to work at the intersection of hardware and software? NVIDIA is looking for world-class programmers and performance architects who love squeezing every performance cycle out of deep learning code, designing and developing scalable modular infrastructure to ship these kernels to different production libraries for use for training and inference use cases. This position offers the opportunity to have real impact in a fast-moving, technology-focused company.What you'll be doing:Analyze the performance of various machine learning/DL algorithms on existing/new architecturesIdentify bottlenecks and propose creative solutions to improve themDevelop high performance operations for cuDNN libraryDesigning and developing software for testing and analysis of our codebasesBuilding scalable automation for build, test, integration, and release processes for publicly distributed deep learning librariesConfiguring, maintaining, and building upon deployments of industry-standard tools (e.g., Kubernetes, Jenkins, Docker, CMake, Gitlab, Jira, etc)What we need to see:Pursuing a B.S., M.S., or PhD degree in computer science (or similar)Strong programming skills in C/C++ developmentPerformance modelling, profiling, debug, and code optimization or architectural knowledge of CPU and GPUExcellent problem solving skills, including applications of algorithms and data structures.NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!