Bio

Jesse Dodge is a Senior Research Scientist at the Allen Institute for AI, on the AllenNLP team, working on natural language processing and machine learning. He is interested in the science of AI and AI for science, and he works on reproducibility and efficiency in AI research. He is involved in many parts of OLMo, a project to create fully open large language models, including creation of Dolma (a web-scale training dataset) and DataDecide (an evaluation framework for language models). He is the research lead of Ai2's Playground for OLMo models, and helped build the OLMoTrace feature in Playground which traces model outputs back to their training data. His research has highlighted the growing computational cost of AI systems, including the environmental impact of AI and inequality in the research community. His PhD is from the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University. He created the NLP Reproducibility Checklist and was one of the creators of the Responsible NLP Checklist which is used for all submissions to ACL, NAACL, EACL, and EMNLP through ACL Rolling Review. His research has won awards including the Best Theme Paper at ACL 2024, Best Resource Paper at ACL 2024, 10-year Test-of-Time Paper at ACL 2022, and Best Student Paper at NAACL 2015, and is regularly covered by the press, including by outlets like The New York Times, Nature, MIT Tech Review, Wired, and others.

Academic Contributions

Jesse Dodge created the NLP Reproducibility Checklist that has been filled out by all submissions to EMNLP 2020, NAACL 2021, ACL 2021, and EMNLP 2021, totaling more than 10,000 submissions. You can read more in the guest post on the EMNLP blog.

He helped create the Responsible NLP Checklist, which combines reproducibility and ethics items, that will be filled out by all submissions to ACL Rolling Review.

He has been an organizer for the ML Reproducibility Challenge in 2020 and 2021, for which he wrote a blog post describing how to use the challenge as a course project.

Workshop Organization

He has organized a number of workshops, including ML Retrospectives at ICML 2020, ML-Retrospectives, Surveys & Meta-Analyses at NeurIPS 2020, SMILES (Setting up MachIne Learning Evaluation Standards to accelerate progress) at ICLR 2022, and a tutorial on Reproducibility at ACL 2022.

Academic Service

He has been area chair for various tracks at different conferences, including the Green NLP track at EACL 2021, the Resources and Evaluation track at ACL-IJCNLP 2021, and the Green NLP track at NAACL 2021. He was senior area chair for the Efficient Methods for NLP track at EMNLP 2021.

He was a Reproducibility Chair at NAACL 2022.

Education & Experience

He earned his PhD in Spring 2020 from the LTI at CMU, though spent much of his PhD as a visiting student at UW CSE in Seattle. His PhD advisor was Noah Smith. As an undergrad he worked with Luke Zettlemoyer. In 2011 he participated in the JSALT workshop, where he worked with a host of fantastic people on the vision and language team (and got a great sweatshirt). He was a research intern for six months in 2015 at Facebook AI Research in NYC with Jason Weston and Antoine Bordes, where he built the Movie Dialog dataset and the MovieQA dataset. In the summer of 2018 he interned at Google AI with Elad Eban, where he worked with the MorphNet team (he got a shoutout in their blog post). He worked at the Allen Institute for AI for all of 2019 on the AllenNLP team.