I'm a principal research engineer at the Allen Institute for AI (Ai2), where I work on the OLMo family of language models. My interests are in large-scale pretraining data curation, model evaluation, and building the infrastructure that ties them together. I'm driven by questions about how data shapes what models learn and how we can reliably measure the result.
Open Source Contributions
-
Training recipes and data intervention tools for the OLMo family of models
-
Data curation toolkit for generating and inspecting OLMo pretraining data
-
Next generation of the Dolma dataset and curation pipeline
-
Code for Bolmo: Byteifying the Next Generation of Language Models
-
PyTorch building blocks for training and inference across the OLMo ecosystem
-
Toolkit for optimizing pretraining data mixtures using small-scale proxy experiments
Publications
-
Mayee F. Chen, Tyler Murray, David Heineman, Matt Jordan, Hannaneh Hajishirzi, Christopher RĂ©, Luca Soldaini, Kyle Lo
Preprint 2026
-
Benjamin Minixhofer, Tyler Murray, Tomasz Limisiewicz, Anna Korhonen, Luke Zettlemoyer, Noah A. Smith, Edoardo M. Ponti, Luca Soldaini, Valentin Hofmann
Preprint 2025
-
Team OLMo: Allyson Ettinger, Amanda Bertsch, ..., Tyler Murray, et al.
Preprint 2025
-
Rulin Shao, Akari Asai, Shannon Zejiang Shen, ..., Tyler Murray, et al.
Preprint 2025
-
Nikhil Kandpal, Brian Lester, Colin Raffel, ..., Tyler Murray, et al.
NeurIPS 2025
-
Team OLMo, Pete Walsh, Luca Soldaini, ..., Tyler Murray, et al.
COLM 2025
-
Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, ..., Tyler Murray, et al.
CACM 2024
-
Rodney Kinney, Chloe Anastasiades, ..., Tyler Murray, et al.
Preprint 2023
-
Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, ..., Tyler Murray, et al.
NAACL · 2018
Personal
I live in Bend, Oregon. When I'm not wrangling data pipelines, I'm dialing in espresso, cooking, or enjoying any and all things outdoors in central Oregon with the fam. I'm a recovering engineering manager who still believes the best code is the code you delete.