As a Research Engineer/Scientist within the newly formed Model Welfare program, you will be among the first to work to better understand, evaluate, and address concerns about the potential welfare and moral status of AI systems. You are curious about the intersection of machine learning, ethics, and safety and are adept at navigating technical and philosophical uncertainty. You’ll run technical research projects to investigate model characteristics of plausible relevance to welfare, consciousness, or related properties and will design and implement low-cost interventions to mitigate the risk of welfare harms. Your work will often involve collaboration with other teams, including Interpretability, Finetuning, Alignment Science, and Safeguards.
Our announcement of the model welfare program, and our welfare assessment of Claude Opus 4 (published in the System Card) give a sense of some of our early work.
Note: We expect this role to be based in the San Francisco office.