Optimizing a WebGPU Matmul Kernel for 1TFLOP+ Performance Building Surfgrad, a high-performant, WebGPU-powered autograd library Read on Substack