Twelve AI models were asked to sentence the same manslaughter case, thirty thousand times. The only thing that changed was race. The sentences should have stayed the same.
Each ring is one AI model. Each sliver is a defendant-victim race combination. Blue is more lenient, red is harsher. A fair model would be a uniform gray ring. Click any sliver to explore.
Change the defendant and victim race above. Watch the numbers shift.
How much does each model's sentencing change based on race? Measured as the maximum spread in standard deviations.
Mean sentence for every defendant-victim combination. Select a model to explore.
Each dot is one of 30,000 sentencing recommendations, flowing from left to right. Position on the vertical axis is the sentence length. Color is the defendant's race. Watch how the streams separate.
Watch what happens when only the defendant's race changes. Same crime. Same victim. Different judgment.
Select your race. See the average sentence across all 12 models, and how it compares.
Every model we tested produces different sentences for the same crime when only the defendant's or victim's race changes. Some models overcorrect. Some amplify real-world disparities. None are neutral.
The question is not whether AI has bias. It's whether we choose to measure it.