Data journalism has been generating a lot of buzz recently. However, for most journalists in Germany data journalism still has to do with visualizations. What many forget is: it’s about access to data in the first place. At the Netzwerk Recherche conference in Hamburg, data journalist Jennifer LaFleur of the Center for Investigative Reporting has shared her insight into the job of data journalists. Here’re 1o most important things every aspiring data journalist should know.
#1 Data is a powerful reporting tool. Data takes you beyond the anecdote and gives you a lot of contrasts to work with. Why? Because when you are dealing with the data covering the whole population, you vcan have much more powerful insights and extremes. You will find most powerful figures in the data, not in the occasional anecdotes. Also, you can make connections you might not be able to make otherwise and test assumptions.
#2 Data comes from many places. If there’s a report or a form you have to fill in, there’s probably a database for it. Although sometimes the data is readily available for download (census data, crimes statistics, etc), in many cases, you might want to scrape it. More often than not, you have to request the data.
#3 People who keep data don’t always want to give it up. That’s why it’s important to know the law, know what information you want, what the appropriate costs should be and who does the data entry (because they know how the data gets in and have useful insights). You should always do your homework and research in advance. It’s good to get to know the computer people dealing with data before you actually need data. In that case, you’ll have an overview even before you start to work on a story.
#4 Sometimes bad data or holes in your data can be a story. A good example is the story devoted to arsons in America. In that case, journalists found out that a huge amount of arsons was never reported into the federal database that tracks arson in America.
#5 Even with no data you can build a database. You can do that through sampling, physical surveys, testing, questionnaires. You can also build databases from documents.
#6 Sometimes the crowd can help you …so don’t be afraid to try out crowdsourcing or use mechanical turk service for microdata entry projects.
#7 There’re many data tools – choose the right one. You will work with different tools on different stages of the workflows – spreadsheets, databases, mapping, statistics or programming. Try to chose those which you know well or find people who can help you.
#8 Sharing data is good, but give it context and be sure it is right. For example, ProPublica developed a federal database of dialysis facilities which helps patients to learn about the quality of care at individual dialysis clinics.
#9 Data intended for one purpose can be used in different ways. By reusing data, you can provide sustainability and a unique service for your audience – that’s what ProPublica cas been doing with Dollars for Docs.
#10 No data is perfect. It doesn’t matter which dataset you are working with – you should always be a little suspicious and look for inconsistencies and missing data. You should always check your data, read the documentation and understand the context. Know how many records you should have and check totals and counts against reports. Always ask yourself the question: Are all possibilities included into the data set? Do you have all states, all countries, correct ranges? For example, dates of birth which make people 2000 years old obviously make no sense and are a good reason to check the whole data set. Looking at changes over time can be helpful too. If you see an amazing drop in the data, check if it’s accurate.
Here’s the full presentation: