Tyler Murray

Principal Research Engineer · Allen Institute for AI (Ai2)

I'm a principal research engineer at the Allen Institute for AI (Ai2), where I work on the OLMo family of language models. My interests are in large-scale pretraining data curation, model evaluation, and building the infrastructure that ties them together. I'm driven by questions about how data shapes what models learn and how we can reliably measure the result.

Open Source Contributions

olmo-cookbook

Training recipes and data intervention tools for the OLMo family of models
dolma

Data curation toolkit for generating and inspecting OLMo pretraining data
dolma3

Next generation of the Dolma dataset and curation pipeline
bolmo-core

Code for Bolmo: Byteifying the Next Generation of Language Models
olmo-core

PyTorch building blocks for training and inference across the OLMo ecosystem
olmix

Toolkit for optimizing pretraining data mixtures using small-scale proxy experiments

News

Feb 2026Olmix: A Framework for Data Mixing Throughout LM Development released on arXiv.
Dec 2025Bolmo: Byteifying the Next Generation of Language Models released on arXiv.
Dec 2025OLMo 3 released on arXiv.
Nov 2025DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research released on arXiv.
Dec 2025The Common Pile v0.1 accepted at NeurIPS 2025.
Oct 20252 OLMo 2 Furious accepted at COLM 2025.
Sep 2024The Semantic Reader Project published in Communications of the ACM.

Publications

Olmix: A Framework for Data Mixing Throughout LM Development

Mayee F. Chen, Tyler Murray, David Heineman, Matt Jordan, Hannaneh Hajishirzi, Christopher Ré, Luca Soldaini, Kyle Lo

Preprint 2026
Bolmo: Byteifying the Next Generation of Language Models

Benjamin Minixhofer, Tyler Murray, Tomasz Limisiewicz, Anna Korhonen, Luke Zettlemoyer, Noah A. Smith, Edoardo M. Ponti, Luca Soldaini, Valentin Hofmann

Preprint 2025
OLMo 3

Team OLMo: Allyson Ettinger, Amanda Bertsch, ..., Tyler Murray, et al.

Preprint 2025
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Rulin Shao, Akari Asai, Shannon Zejiang Shen, ..., Tyler Murray, et al.

Preprint 2025
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Nikhil Kandpal, Brian Lester, Colin Raffel, ..., Tyler Murray, et al.

NeurIPS 2025
2 OLMo 2 Furious

Team OLMo, Pete Walsh, Luca Soldaini, ..., Tyler Murray, et al.

COLM 2025
The Semantic Reader Project

Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, ..., Tyler Murray, et al.

CACM 2024
The Semantic Scholar Open Data Platform

Rodney Kinney, Chloe Anastasiades, ..., Tyler Murray, et al.

Preprint 2023
Construction of the Literature Graph in Semantic Scholar

Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, ..., Tyler Murray, et al.

NAACL · 2018

Personal

I live in Bend, Oregon. When I'm not wrangling data pipelines, I'm dialing in espresso, cooking, or enjoying any and all things outdoors in central Oregon with the fam. I'm a recovering engineering manager who still believes the best code is the code you delete.