Bio

Jesse Dodge is a Senior Research Scientist at the Allen Institute for AI, on the AllenNLP team, working on natural language processing and machine learning. He is interested in the science of AI and AI for science, and he works on reproducibility and efficiency in AI research. He is involved in many parts of OLMo, a project to create fully open large language models, including creation of Dolma (a web-scale training dataset), Palmoa (an evaluation benchmark for language models), and incorporating ethical principles at every stage of the machine learning pipeline. His research has highlighted the growing computational cost of AI systems, including the environmental impact of AI and inequality in the research community. He has worked extensively on improving transparency in AI research, including open sourcing and documenting datasets, data governance, and measuring bias in data. He has also worked on developing efficient methods, including model compression and improving efficiency of training large language models. His PhD is from the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University. He created the NLP Reproducibility Checklist, which has been used by five main NLP conferences, including EMNLP, NAACL, and ACL, totaling more than 10,000 submissions, he helped create the Responsible NLP Checklist which is used for submissions to ARR (replacing the Reproducibility Checklist), and was an organizer for the ML Reproducibility Challenge 2020-2022. His research has won awards including a Best Student Paper at NAACL 2015 and a ten-year Test of Time award at ACL 2022, and is regularly covered by the press, including by outlets like The New York Times, Nature, MIT Tech Review, Wired, and others.

Details
Jesse Dodge created the NLP Reproducibility Checklist that has been filled out by all submissions to EMNLP 2020, NAACL 2021, ACL 2021, and EMNLP 2021, totaling more than 10,000 submissions. You can read more in the guest post on the EMNLP blog.
He helped create the Responsible NLP Checklist, which combines reproducibility and ethics items, that will be filled out by all submissions to ACL Rolling Review.
He has been an organizer for the ML Reproducibility Challenge in 2020 and 2021, for which he wrote a blog post describing how to use the challenge as a course project.
He has organized a number of workshops, including ML Retrospectives at ICML 2020, ML-Retrospectives, Surveys & Meta-Analyses at NeurIPS 2020, SMILES (Setting up MachIne Learning Evaluation Standards to accelerate progress) at ICLR 2022, and a tutorial on Reproducibility at ACL 2022.
He has been area chair for various tracks at different conferences, including the Green NLP track at EACL 2021, the Resources and Evaluation track at ACL-IJCNLP 2021, and the Green NLP track at NAACL 2021. He was senior area chair for the Efficient Methods for NLP track at EMNLP 2021.
He was a Reproducibility Chair at NAACL 2022.

He earned his PhD in Spring 2020 from the LTI at CMU, though spent much of his PhD as a visiting student at UW CSE in Seattle. His PhD advisor was Noah Smith. As an undergrad he worked with Luke Zettlemoyer. In 2011 he participated in the JSALT workshop, where he worked with a host of fantastic people on the vision and language team (and got a great sweatshirt). He was a research intern for six months in 2015 at Facebook AI Research in NYC with Jason Weston and Antoine Bordes, where he built the Movie Dialog dataset and the MovieQA dataset. In the summer of 2018 he interned at Google AI with Elad Eban, where he worked with the MorphNet team (he got a shoutout in their blog post). He worked at the Allen Institute for AI for all of 2019 on the AllenNLP team.