Optimizing a WebGPU Matmul Kernel for 1TFLOP+ Performance

Optimizing a WebGPU Matmul Kernel for 1TFLOP+ Performance

Building Surfgrad, a high-performant, WebGPU-powered autograd library

Read on Substack



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • A New Era of AI: Exploring Multimodal Embedding Observability
  • Matryoshka Embeddings with Aditya Kusupati, Zach Nussbaum, and Zain Hasan - Weaviate Podcast
  • How Nomic AI Is Driving The Open Source Revolution - MAD Podcast with Matt Turck
  • Stonks Only Go Up: Building a WallStreetBets Sentiment Model