Interactive visualizations and infographics are gaining popularity in the media. More and more journalists get interested in doing data-driven stories on their own or with help of developers. So what do aspiring data journalists need to know, what do they need to keep in mind when working with developers, how much programming do journalists actually need to understand – and what are the best tools to get started?
I talked to the German visualisation architect and interactive news developer Gregor Aisch for the DW Akademie. Among other things, Gregor has worked on the development of Datawrapper, a tool that allows journalists to build simple interactive charts. He also develops visualizations for media organisations as Deutsche Welle and Zeit Online.
“Journalists need to overcome their fear of the so called hackers,“, says Gregor. “This might sound funny to some, but many people still have these mental image of criminal suspects spending their days and nights writing viruses and trying to crack security systems of banks and governments”.
Gregor, you come from the technical side. Do you enjoy working with journalists?
Yes, I really like working with journalists. They usually have a clear understanding of the subject they’re reporting about, and do have lots of interesting ideas for how to tell stories. That’s something you cannot always expect when you’re working with the marketing guys or PR departments.
What do journalists need to know to tell compelling data-driven stories?
Getting a good feeling for the scope, time frame and limitations of web based data visualizations is very helpful in the day-to-day work with data journalism. After all, designing interactive visualization is not unlike software development, so you need to be good in managing expectations and resources to ensure the final result is delivered in time. Otherwise lots of frustration is inevitable.
What do journalists need to keep in mind when teaming up with developers?
It’s a lot like with any other project management. It’s a good idea to talk to each other on a regular basis, especially in the beginning of a new project. The big challenge is to develop a common vision of something that is just not yet there. Hand-drawn sketches are really powerful for communicating ideas in such early stages.
How much programming do journalists actually need to understand?
First of all, journalists need to overcome their fear of the so called ‘hackers’. This might sound funny to some, but many people still have these mental image of criminal suspects spending their days and nights writing viruses and trying to crack security systems of banks and governments. Regular meetings such as the Hacks/Hackers meeting in Berlin are helping in ‘building bridges’ between hacks (journalists) and hackers (developers).
For the day-to-day work with datasets, I find that it is really helpful to learn some basic scripting. The initial learning curve is steep, but after one or two days you’ll be impressed what you can achieve. Often, writing a few lines of code can save us from repeating simple tasks over and over again. In an environment such as online journalism where deadlines are usually short and critical, time-saving is crucial.
And, as said above, it would be very helpful if journalists would get a feeling for what is doable and how long things take. I don’t think that every journalist need to be able to create full interactive data visualizations, since it takes several years to get there, but they should keep an eye on the possible solutions.
What visualisation possibilities do journalists have in case they can’t program?
Excel, LibreOffice, Tableau Public, ManyEyes, Datawrapper, QuantumGIS: all these tools provide visualization modules that enable anyone to create rich data visualizations and maps in a short time. The tools are either free downloadable, already installed on most computers or accessible via web browsers, so it’s easy and cheap to get started. But of course, it takes some time to learn new software.
What should journalists interested in data reporting start with?
I would consider working with a spreadsheet software as the first step for doing data journalism. Most data-driven stories start with a simple data table, so you need to be able to perform simple computations, such as dividing the number of fatalities by the total number of car accidents in all the districts of your country. For those who get confused by the strange user interface of Excel, I clearly recommend working with the free alternative LibreOffice Calc. This is spreadsheet as simple and powerful as it gets.
There’re a number of data visualizations out there which has become sort of a hype. What are the worst mistakes made in these visualizations?
First of all, I love data visualization, so I like them being used more and more in reporting. One of the worst mistakes I have seen are poorly labeled charts. If you solely rely on tooltips you can be sure that you’re doing it wrong.
What are your favourite visualizations and why?
This example by the New York Times shows a scatterplot with the college graduation rate on the horizontal axis and the percentage of black, Hispanic and Asian students among the freshman on the vertical axis. You can clearly see that at the top colleges with high graduation rates the rates of black students are significantly lower, for instance there are only 8.8% black students among last year freshman at Harvard university which has a graduation rate of 97%.
The topic is big in US right now, as more states are now banning the ‘affirmative action plans’. Going back to the Kennedy/Johnson administration of the 1960′s, these plans required universities to reach certain quotes for different races. Now more and more Republican states are aiming to ban these plans because such a quote would not be lawful.
In a follow-up piece a few weeks ago the NYT showed the effect of previous bans in states such as California. Here they compared the percentage of freshmen of certain races with the total percentage of college-aged residents of that race. Especially in California, the ban of the affirmative action plan was followed by a dramatic drop of enrollment of hispanic and black students.
As a third example I would like to point to an excellent piece done by Matthew Bloch and Hannah Fairfield. In an interactive Venn diagram the showed how diseases of the elderly are overlapping. A total of 9% of 700,000 people living in assisted living centers have a combination of heart disease, high blood pressure and Alzheimer’s disease, which is extremely difficult to treat.
All of these example come from the New York Times. What’s so special about the way they approach data journalism?
Amanda Cox has once put it this way: you need to take the reader by the hand and guide him through the graphic. Without this annotation layer, any graphic is just a matter of “here you go, now figure it out yourself”. The other thing I think the NYT is doing right is to hire the right kind of people (graphic editors, cartographers, 3d animators and interactive developers) and bring them together in one single place. In a lot of newsrooms in Germany you can still see the graphic teams for print or television separated from the online teams, making it a lot harder to create high quality interactive graphics.
What data visualization resources you’d recommend to keep an eye on?
I’d recommend this list of data visualizations, the blog by Alberto Cairo, this resource from the Mozilla Open News, this collection of pieces of code for different visualizations and the website “Informations Aesthetics”.
What are the books every aspiring data journalist should read?