Zach Nussbaum

Nomic AI

prof_pic.jpg

I’m a principal machine learning engineer at Nomic where I lead the development of Nomic Embed, state-of-the art text and vision embedding models.

Prior to Nomic, I worked at Deep Genomics where I worked on machine learning for drug discovery. I worked on reimplementations of gene expression prediction from sequence (Enformer) as well as helping to train BigRNA, a model trained to predict tissue-specific RNA expression, splicing, microRNA sites, and RNA binding protein specificity from DNA sequence.

I have also participated in open-source/open-community research groups with ML Collective and OpenBioML.

In a past life, I was a Divsion 1 Baseball Player at Davidson College (yes, Steph Curry’s Davidson).

news

Jan 25, 2025 CoRNStack: High-Quality Contrastive Data for Better Code Ranking accepted to ICLR 2025!
Jun 01, 2024 Nomic Embed Vision techincal report, weights, and blog released.
Mar 04, 2024 DNA Diffusion accepted to ICLR MLGenX and selected for Outstanding Paper Award
Feb 14, 2024 Nomic Embed Text v1.5 blog and weights released.
Feb 01, 2024 Nomic Embed Text v1 techincal report, weights, and training code released.
Sep 26, 2023 BigRNA Preprint and blog post released
Jul 21, 2021 A Tale of Two Long Tails presented at the UDL Workshop at ICML

latest posts

selected publications

2025

  1. cornstack.png
    CoRNStack: High-Quality Contrastive Data for Better Code Ranking
    Tarun Suresh, Revanth Gangi Reddy, Yifei Xu, and 4 more authors
    In International Conference on Learning Representations (ICLR), 2025

2024

  1. nomic-embed.jpeg
    Nomic embed: Training a reproducible long context text embedder
    Zach Nussbaum, John X Morris, Brandon Duderstadt, and 1 more author
    arXiv preprint arXiv:2402.01613, 2024
  2. dna-diffusion.png
    DNA-Diffusion: Leveraging Generative Models for Controlling Chromatin Accessibility and Gene Expression via Synthetic Regulatory Elements
    Simon Senan, Aniketh Janardhan Reddy, Zach Nussbaum, and 5 more authors
    In ICLR 2024 Workshop on Machine Learning for Genomics Explorations, 2024

2023

  1. gpt4all.jpeg
    Gpt4all: Training an assistant-style chatbot with large scale data distillation from gpt-3.5-turbo
    Yuvanesh Anand, Zach Nussbaum, Brandon Duderstadt, and 2 more authors
    GitHub (2023), 2023
  2. big-rna.png
    An RNA foundation model enables discovery of disease mechanisms and candidate therapeutics
    Albi Celaj, Alice Jiexin Gao, Tammy TY Lau, and 8 more authors
    bioRxiv, 2023

2021

  1. two-tails.png
    A tale of two long tails
    Daniel D’souza, Zach Nussbaum, Chirag Agarwal, and 1 more author
    arXiv preprint arXiv:2107.13098, 2021