It's All Journalism

The broccoli of media-focused podcasts.

  • Podcast
  • Blog
  • FAQ
  • How to Podcast
  • Newsletter
  • Take a Survey

#234 – DIY approach to data scraping bolsters crime coverage

January 5, 2017 by ItsAllJournalism

Sometimes the only way to get the tools you need is to build them yourself.

Joshua Vaughn is a criminal justice reporter with the Sentinel in Cumberland County, Pennsylvania, and two years ago he started thinking about the information contained within the court documents created every time a person is charged with a crime. Working with a friend, he taught himself to code and built a tool to search those dockets, creating a searchable database that can provide statistics about the crime rate across the state.

Joshua Vaughn

Joshua Vaughn

“Where it all really started is, we’re like most newspapers, trying to do more with less people,” Vaughn said. “I thought there was a need and there was something there that could help us out. … As journalists, we have our routines, we have our ways of doing things. We have our data set that we look at, whether crime with dockets or education with reporting of Pennsylvania school profiles. There are things we look at all the time that we use in one way. I wanted to find different ways of using it. It was a matter of trying to take a step back and realize this (information) was there if we could find a way to pull it out.”

Now Vaughn has been writing one data-driven criminal justice article each month, led by or sourced with information culled from the data scraping tool he created. He wrote an article about people who were arrested on drug-related charges or found to have drug abuse issues when arrested and what happens to those people when they’re sentenced to jail time. “I was doing some keyword searches in documents and found several people in there who had died of drug overdoses” while in jail, he said.

In December, he wrote an article looking back at what appeared to be a noticeable uptick in the number of dockets opened in Pennsylvania that year, a 500-case increase in a county that normally gets 3,000 to 4,000 criminal filings each year. While it sounds like a big problem and might be cause for concern, Vaughn used the tool he created to determine things weren’t really that bad.

“DUIs are up substantially, drug crime and arrests were up substantially. The largest portion of drug crime arrests were simple possession, small amount of drugs not for delivery, and people getting charged with possession of small amounts of marijuana,” a separate crime in the state, he said. “Those two alone accounted for about 75 percent of the overall caseload increase.”

The underlying reason for this increase, he determined, was due to a shift in enforcement. “One of the chiefs of police made the point that the county is going through a shift in work staff, with older officers retiring. Younger officers tend to spend more time on the road and they have to do more traffic enforcement. More traffic enforcement leads to more people getting pulled over for DUIs.”

— Amber Healy

On this week’s It’s All Journalism podcast, host Michael O’Connell talks to Joshua Vaughn, a criminal justice reporter at the Sentinel newspaper in Cumberland County, Pennsylvania. Vaughn taught himself coding and built tools that allowed him to scrape data from public records. He shares how this information has both supported the reporting he had been already doing and led to new stories about the criminal justice system.

#183 – Data journalism by the numbers

Share Button
If you like this post, please share it along:

Previous Post

John Burn-Murdoch is a data journalist with the Financial Times in London.
Data journalism leads top 10 podcasts for 2016

Next Post


#235 – Is it ethical for a journalist to go to church?

Leave a Reply Cancel reply




Related Posts

  • Krissy Clark is the creator of NPR Marketplace's new podcast, The Uncertain Hour.#205 – Where economic policy and real life meet
  • #301 — How about a nutrition label for news truthiness?
  • 448. Giving sources the power to tell their own stories
  • The three producers of the Ear Hustle podcast sit on stairs at San Quentin Prison in California.#291 — Ear Hustle’s remarkable success in podcasting from prison

Learn How To Podcast

Turn Up the Volume equips journalism students, professionals, and others interested in producing audio content with the know-how necessary to launch a podcast for the first time. It addresses the unique challenges beginner podcasters face in producing professional level audio for online distribution. Beginners can learn how to handle the technical and conceptual challenges of launching, editing, and posting a podcast.

Order this new book by It’s All Journalism Producer Michael O’Connell.

Take a Survey, Earn Some Swag

If you haven’t heard, we created a five-question online survey to help us assemble a toolbox for journalists that we’ll share on our podcast and website. Please take a few minutes to share the tools that help make your job easier.

We’ve also just launched a new survey on how to improve our podcast. Let us know how we could do better.

To those people who complete one of our the surveys, we’ll be sending out a limited number of It’s All Journalism coffee mugs while supplies last. Show your support for good journalism by taking the survey and get a reward in return.

Help Support Our Podcast

Promoting good journalism is essential in a democracy. By donating to the It’s All Journalism Patreon page, you will help ensure that we continue producing the weekly podcast that focuses on good journalism. You’ll also help to boost us to the next level with live events and exclusive content. Donate here.

Sign Up for Our Weekly Newsletter

Latest Posts

  • Better News: Nonprofit newsrooms turn rivalry into revenue stream
  • 456. What Is Life shares stories of prisoners facing life sentences
  • 455. Making Google, Facebook pay their fair share for news
  • 454. Solutions Journalism tackles community problems
  • 453. Making sure Wikipedia shows that Women Do News

Copyright © 2021 · Pintercast Child Theme on Genesis Framework · WordPress · Log in