Can algorithms help judges make fair decisions?

Is taking away the human factor the key to more just rulings?

Listen 11:45
In this Aug. 30, 2017, photo, Stephanie Pope-Earley, right, sorts through defendant files scored with risk-assessment software for Jimmy Jackson Jr., a municipal court judge, on the first day of the software's use in Cleveland. In a growing number of local and state courts, including Cleveland, judges are now guided by computer algorithms before ruling whether criminal defendants can return to everyday life, or remain locked up awaiting trial. (Dake Kang/AP Photo)

In this Aug. 30, 2017, photo, Stephanie Pope-Earley, right, sorts through defendant files scored with risk-assessment software for Jimmy Jackson Jr., a municipal court judge, on the first day of the software's use in Cleveland. In a growing number of local and state courts, including Cleveland, judges are now guided by computer algorithms before ruling whether criminal defendants can return to everyday life, or remain locked up awaiting trial. (Dake Kang/AP Photo)

When judges impose sentences, they consider the crimes, and how likely the offenders are to offend again.

Lori Dumas has been a judge in family, and criminal courts in Philadelphia for more than a decade.

She said she knows people end up in front of her because they did something. But she also considers that a criminal record, as a result of her ruling, will likely affect the rest of their lives, whether someone gets a place to live, what kind of job they can get.

“Sometimes, people think it’s easy what we do. It really isn’t,” she said. “Because we are responsible for lives.”

She thinks about someone’s childhood, work history, mental health, whether they show remorse. And she also looks ahead: What is going to happen to this person six months, a year from now, because of what she decides?

Dumas follows the law first, but within that, she has quite a bit of discretion.

If she thinks she needs more information about a case, she can ask for it: school records, child welfare records, or a psychological evaluation.

“People think that all judges should be able to look at the law and look at a fact situation and rule the same way,” Dumas said. “The reality is, is that you can have one particular set of facts heard by five different judges and get five different results. And that’s based on the fact that we are people before we come to the bench.”

For example, she is a parent. Her point of view in juvenile court will probably be different than that of a judge who does not have children.

“Our own background, I would say, is probably 95% of the reason why what you may hear in one courtroom is totally different than what you may hear in another.”

Ten years ago, Pennsylvania’s legislature decided there was a problem with how all this worked.

“It was totally arbitrary,” said Todd Stephens, a member of the state House of Representatives who chairs the Pennsylvania Commission on Sentencing.

“Judges were just on their own deciding, ‘Well, jeez, this defendant in front of me, it seems like I might need to take a closer look and get more information…’ But there was no evidence-based or data-driven objective standard for requesting additional information: It was done on a whim, every judge had their own internal set of criteria that they might use on whether or not to get more information about a particular offender.”

So in 2010, the state panel worked on an algorithm, a formula, that would allow a computer to predict how likely a person was to commit another crime and recommend when judges should get more information about a case. The goal was to make sentencing more consistent, reduce prison populations, and lead to less crime.

Mark Bergstrom, executive director of the Commission on Sentencing, said compared to judges, an algorithm can process lots of data. “When we started our project, we didn’t look at a handful of cases, we looked at over 200,000 cases to try to see what factors sort of related to positive and negative outcomes. And that’s information that judges didn’t have or didn’t have in a … structured … way.”

The formula will look for patterns based on age, gender, what crime someone is being convicted of, prior convictions and for which crimes, and whether the offender has a juvenile record. It cannot take race into account, or county, which is seen as a proxy for race.

The judge will still make the ultimate decision on sentencing. The algorithm will be rolled out this year, and evaluated after 12 months.

It took 10 years to create because it was so controversial.

For one thing, critics were afraid that a tool built from criminal justice data would still discriminate against people of color. Pennsylvania is more than 80% white. Almost half the prison population is black.

Stephens, the Sentencing Commission chair, said, “We took every step that we thought we could to prevent race from being a factor. And on top of that, we’re required [to do] this review within a year to ensure that it is, in fact, not a factor.”

“That’s like saying we’re going to take the salt from the sea,” said Reuben Jones, an activist who leads campaigns to end mass incarceration in Philadelphia.

“The reality is there are a lot of proxies for race, that are utilized in these risk assessment tools, so we can’t pretend that you can’t determine a person’s race by other means other than asking what’s your address.”

He also questioned whether data from the past can say something meaningful about someone’s future.

When Jones was 22, he was convicted of robbery and aggravated assault. It was not his first arrest. He went to prison for 15 years.

Since his release, he has gotten a master’s degree, become a community leader, and received a President’s Volunteer Service Award. He said a risk assessment tool could not have predicted that.

“The risk assessment ideally … in their minds will tell them who’s going to create a crime in the future and I’m saying that that’s ridiculous,” Jones said. “That’s a ridiculous assumption to have, that a piece of technology can predict human behavior in advance.”

The risk assessment depends on one big assumption: what happened in the past 200,000 cases tells us something about what’s going to happen with the next person. Jones said that is not fair.

“Individualized justice is what we need, so when someone comes in front of a judge, forget the algorithm and all those things, talk to them like they’re a human being … and see what does the person in front of me need.”

There is research on what a risk assessment algorithm will do: Virginia started using one in the early 2000s. Megan Stevenson, assistant professor of law at George Mason University, studied the effects: The number of people in prison did not go down, recidivism did not go down, and black people were slightly more likely to be incarcerated compared to white people, all else being equal.

“The impacts of a risk assessment tool don’t just depend on the statistical properties of the algorithm,” Stevenson said. “They depend on how human beings respond to the algorithm, when they choose to follow it, when they choose to ignore it.”

For example: When young people committed a crime, the risk assessment tool said those people are likely to commit more crime, sentence them harshly. But judges systematically said no.

Were the judges wrong?

On one hand, it’s well documented that criminals tend to do more crime when they’re young and less when they’re older. Statistically, young age is a strong predictor of future crime.

But Stevenson said there is more to a sentencing decision than risk of future crime.

“If the goal at sentencing was simply to lock up those that are at highest risk of reoffending, there is an argument to be made that you should just lock up all teenage boys and throw away the key and let them out when they turn 28 or 30,” Stevenson said. “Now, of course, many people would find that idea really repugnant, and that’s because we don’t just care about statistical likelihood of reoffending, we care about culpability.”

Judges in Virginia thought it wasn’t fair to punish a young person as harshly, even though the algorithm said young people are at a higher risk of doing more crime. Maybe judges thought teens are impulsive, their brains aren’t fully formed, being in prison could actually be worse for them. So they went a little easy on them.

Stevenson also agreed with Reuben Jones that no risk assessment algorithm can strip out the long history of racial inequity, or even outright racism, in the criminal justice system.

“Other people phrase it as a … `would you rather’ question. Would you rather be sentenced by a judge who is also potentially racist, who is being influenced by this track record that is an imperfect proxy for actual criminal behavior? Or would you rather be sentenced by an algorithm in which at least race is not an explicit input? And I don’t think there is a clear answer, and maybe the question is not the right question.”

There are arguably better ways to reduce racial inequality in the criminal justice system than using an algorithm, Stevenson said —  like real accountability.

“What if you just altered the incentives of judges, or better yet altered the judges themselves?”

She suggests more transparency.

“If the racial disparities in sentencing decisions were tracked at the judge level, made public, incorporated into re-election campaigns, I think judges would really start paying attention to racial disparities, and I think we would see them shrink.”

The criminal justice example illustrates a lot of the broader questions about using algorithms, or even machine learning, in making decisions for us. For one thing, it forces us to spell out in detail what is fair and what is not, said Michael Kearns, a computer scientist at the University of Pennsylvania.

“You should never expect machine learning to do something for free that you didn’t explicitly ask it to do for you, and you should never expect it to avoid behavior that you want it to avoid that you didn’t tell it explicitly to avoid.”

He and fellow computer scientist Aaron Roth wrote a book about socially aware algorithm design, called “The Ethical Algorithm.”

If you want an algorithm to be fair to people of all races, ethnic groups, ages, disability status, gender identity … you have to state that, Kearns said. You have to define what you think is fair, then decide and code for the kind of fairness you want.

And it doesn’t end with the design process, said Annette Zimmerman, a political and moral philosopher at Princeton University.

“It’s not like we tick off items on a checklist and then we have an ethical decision-making model and then we just deploy it.”

Zimmerman said we have to keep reflecting on how an algorithm is working or not working, and how we can fix it. That’s because algorithms, like humans, will make mistakes.

It’s common to think of data as a kind of technological and social mirror that reflects human biases — garbage in, garbage out.  Zimmerman said data is actually more like a magnifying glass that could amplify inequality if left unchecked.

She and others said that an algorithm cannot tell us what’s fair — and it can only ever be as good as what we put into it.

WHYY is your source for fact-based, in-depth journalism and information. As a nonprofit organization, we rely on financial support from readers like you. Please give today.

Want a digest of WHYY’s programs, events & stories? Sign up for our weekly newsletter.

Together we can reach 100% of WHYY’s fiscal year goal