You will be responsible for pretraining data research. You may be working on understanding pretraining data trends and scaling laws, optimizing pretraining data mixes, investigating potential new sources of data, building research tools to better understand experimental results, or figuring out how to process and use pretraining data most effectively.