We are looking for a first-class Deep Learning Performance architect to join in us to drive the performance analysis, modelling and optimization of top Datacenter, Automotive and Client AI networks. Help building and enhancing our performance analysis infrastructure. In this role, you will analyze top inference networks, identify, prototype or model perf opportunities to guide SW and Arch for NVIDIA’s current and next generation GPU and SOC products.What you’ll be doing:Establish deep learning applications and use-cases for performance analysis, modelling, and projectionsAnalyzing and proposing both SW and HW optimizations for deep learning applicationsSpecify hardware/software configurations and metrics to analyze performance, power, accuracy and resiliency in existing and future uni-processor and multiprocessor configurationsCollaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, library, and compiler teamsBuild Performance Analysis InfrastructureWhat we need to see:MS or PhD in relevant discipline (CS, EE, Math)Strong background in computer architectureExpert mathematical foundation in machine learning and deep learningStrong programming skills in C, C++, Perl, or PythonWays to stand out from the crowd:Prior experience working on assembly level performance optimizationExperience working with deep learning frameworks like TensorFlow and TorchFamiliarity with GPU computing CUDABackground with systems-level performance modeling, profiling, and analysisExperience in characterizing and modeling system-level performance, executing comparison studies, and documenting and publishing results