How Can We Harness Pre-Training to Develop Robust Models?

We explore a simple principle for harnessing pre-training to develop robust models.

Ask Your Distribution Shift if Pre-Training is Right for You

We study the robustness benefits of pre-training and characterize failure modes that pre-training can and cannot address.

DsDm: Model-Aware Dataset Selection with Datamodels

Selecting better data by approximating how models learn from data.

How Training Data Guides Diffusion Models

We introduce a new framework for data attribution in generative settings, and propose an efficient method to attribute diffusion models.

Rethinking Backdoor Attacks

We introduce a new perspective on backdoor attacks and defenses in deep learning.

TRAK-ing Model Behavior with Data

We introduce TRAK, a new data attribution method that scales to large(r) models!

Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation

We introduce dataset interfaces, a scalable framework that synthesizes counterfactual examples under user-specified shifts

Tailored Data Augmentation to Mitigate Model Failures

We demonstrate how we can use Stable Diffusion to target a model's failure modes