There are so many charts, facts and statistics out there regarding Coronavirus and the pandemic. Numbers can bring clarity, as long as they are accompanied by context. In their rawest form, if they come from a reputable and verifiable source, they can be the core of understanding.
Take, for instance, the recent confusion and discussion over case fatality rate (rate of deaths per confirmed cases) versus mortality rate (deaths based on population). Without context, it’s hard to know which of these stats is more important. Taken together, and with context, a clearer picture emerges by looking at BOTH statistics.
But either number can be used to illustrate a particular point, and without context there are many people who will just take that set of numbers at face value. In reality, mortality rates can be affected by demographics (populations with more older people have higher rates or mortality, for example). Case fatality rate can be affected by the number of tests (in countries were more testing is done, milder cases are part of the overall case number, and the case fatality rate is lowered). Here they are side by side.
And here are maps showing both deaths per million people and case fatality rates.
The lesson: Data is very powerful, but like anything with much power it can be abused and misused. Context, combined with data, provides the clarity and the complete picture.
Universities harnessing data
One source for worldwide virus information is Our World In Data, out of the University of the Oxford Martin School. This data is open source, and therefore I’ve been able to include it here for viewing. It is updated daily. It is one of the few open source databases I’ve found that is also comprehensive. The Data Explorer allows the user to create countless views. It’s very powerful.
Some of the other best sources I’ve found have been university-related, although those are not the only ones. The Harvard Gazette includes a lot of stories about the research and studies being done by Harvard. One of the stories led me to the Pandemics Explained data resource, from the Harvard Global Health Institute. Another is the Johns Hopkins University Coronavirus Resource Center, which is the source of a lot of other sites’ data, including the great New York Times Coronavirus Dashboard. The Johns Hopkins data is probably the most widely used in the United States.
Like the World in Data and Johns Hopkins data centers, Harvard’s Pandemics Explained allows for the user to explore in great detail. I was able, for example, to drill down to one county in Michigan and see the stats for that county over time. The Pandamics Explained database, based on Microsoft’s Power BI does not allow, however, for direct embed the way the Oxford open-source database does.
Here’s a snapshot to show the numbers in Macomb County, the only county in southeast Michigan at a higher risk level right now, and Gogebic County, in the far western Upper Peninsula, which has the highest risk level in the state right now, with a seven-day moving average of 37.8 cases per 100,000 population. As a journalist, I would take that information and use it to find out more about what’s going on to create the current situation in those two very different areas.
Johns Hopkins’ Coronavirus Resource Center
If any entity is creating a better, easier-to-use data resource than Johns Hopkins’ Coronavirus Resource Center in the United States, I’d like to know about it. Johns Hopkins’ COVID-19 Testing Insights Initiative is a collaboration of many of the universities experts, departments and resources. One thing Johns Hopkins creates is a daily snapshot for YouTube that is pretty cool. The one embedded here is for Aug. 7, but you can easily go to YouTube and find the latest.
The overall Johns Hopkins Coronavirus Resource Center is amazing. As with the Oxford resource, I could spend all day on it. I wish that, like the Oxford databases, the Johns Hopkins ones were open source.
The CRC has several great visuals that make seeing the daily and overall picture easier. I like this Testing Trends Tool a lot; it shows at a glance which states are doing well in testing, and with a click you can sort by the various columns. Here, I’ve sorted by “positivity rate” to show the states with the highest rate of positives per people tested. At this point, when I took this snapshot, Michigan was near the bottom with a 2.5% positivity rate. As with any really good data tool, Johns Hopkins explains how to use the tool with easy “information buttons” and side paragraphs, and always provides the sources of the information.
COVID Tracking Project
The COVID Tracking Project, which pulls from several data sources to provide state-level information on a mostly daily basis, powers many of the data dashboards and visualizations being used around the country. Unlike databases that are pulling from automatic feeds, the COVID Tracking Project relies on volunteers to manually update the numbers from state resources. The tracker lists the source of data, which every good data aggregator should do. It also “grades” the quality of that data, so that readers understand how “good” the data is coming from those sources.
In addition to providing context and daily updates on the site itself, the COVID Tracking Project has a daily newsletter you can sign up for. I highly recommend it.
The COVID Tracking Project does not track by county, as it is limited by the volunteer resources that pull the numbers together. But there is something to be said for the verification that goes on during manual updates. For example, if there are anomalies in the data on particular days or a change in how the data is being reported, the tracking project notes these as best it can in the state detail.
The COVID Tracking Project also includes a Racial Data Dashboard by state, a collaboration with the Boston University Center for Antiracist Research. This is a snapshot of the Michigan data, with notes on disparities and inconsistencies.
The COVID Tracking Project allows for comparing data state by state and region by region. Its graphics are easy to use and interpret. And there are great explanations and FAQs. I especially like the Visualization Guide, which provides tips on how to use the data to get the most out of it.
Here’s an example directly from the guide:
Charting the number of positive tests alone is often problematic. Simple case counts show where people are being tested, not necessarily where people are sick. To illustrate the point, a state that reports three cases of COVID-19 after testing 2,000 people is probably in a different stage of its outbreak than a state that reports three cases but has only tested 20 people. But if all you have is a case count, those states look exactly the same. That is why we need to include the total number of tests as a denominator.
COVID Tracking Project
The project’s visualizations are especially helpful in viewing the virus’ impact over time, across the nation, by region and by state. This choropleth map (A term I admit I wasn’t even familiar with before this year) allows for viewing tests, positive tests and deaths across time across the country, which shows the progression of the virus over the months. Using the slider at the top, you can view these stats by date. The switch to bubble map allows for a different visualization, and scrolling over a state will display details.
Axios: No-nonsense, easy to view data
Among my favorite data sources is Axios, which presents information cleanly and provides no-nonsense 2-minute reads along with them for context, with the ability to go deeper if you wish. Axios, like many media organizations, relies on the COVID Tracking Project and Johns Hopkins for its data. Most people don’t have the time or inclination to dig deep into a cavernous database, and Axios presents it in a more digestible form.
Creating your own visualizations
There are a number of tools you can use to create your own data visualizations – if you have the data in a form that can be uploaded, such as a CSV file. The COVID Tracking Project provides links to such data sets. One tool being used by a number of organizations is Flourish, which like many others offers a free version and a paid version, which has lots more bells and whistles. Like any tool that allows for users to create their own visualizations it’s important to check publicly shared visuals for original sources and accuracy.
Originally I was enamored with this COVID Superspreader spreadsheet and accompanying bubble map, created with Flourish. The idea of plotting out all the superspreader COVID events over time across the world is awesome.
However, map data is only as good as the map coordinates that accompany each point, and in the case of this map there are quite a few misplaced points, including the one shown in the screenshot above. I’ve also embedded the Flourish map so you can try it yourself. The date format at the bottom is a bit confusing, but that’s because this database was built by someone in Europe, and therefore the date format is European, with the day first, then month, then year.
There doesn’t seem to be a team verifying the information as quickly as there should be, which makes me wary of the information altogether. When I’m wary, I dig a bit. The author wrote on Medium about the database and its mission, as well as linking readers to a website, which was a dead link, further indicating that despite this being a great idea it has had some missteps. When I viewed the map and spreadsheet I found several “events” in Michigan, including the one tied to an East Lansing bar. But the number of cases is vastly higher than listed here; the map has not been updated since early July. And there are at least two events shown in the Michigan map that are actually in other states (one in Lansing, Kansas – not Lansing, Michigan).
All that to say: Be dubious and check out sources before relying on them and passing them on to friends on social media (or using them in your own reporting).
Media organizations vary in presenting data
There are, of course, numerous media organizations that are doing a great job presenting data in detailed dashboards. I pay attention to all of the Michigan numbers, and there are several options for that. One good option is the Bridge Michigan’s Coronavirus Dashboard, which is super easy to view and use. Bridge appears to be using Infogram, another visualization tool that has pricing from free to “enterprise,” should you want to do up to 10,000 projects.
The Detroit Free Press has a good Coronavirus dashboard, too, using a combination of state-provided numbers, Flourish charts and DataWrapper, a longtime favorite tool of mine. It’s been around for years, and is easy to use. One thing that’s great about DataWrapper is every visual gives the reader the opportunity to go directly to the data. Also, the Freep provides all its sources, with links to the sources, as well as an “About the data” explanation. It’s something I wish the Bridge team was doing as well. Transparency and clarity are everything when presenting aggregated data.
All the Free Press visuals can be embedded because they are located on Flourish and DataWrapper. However, the Freep closes the door on making all its projects open source, as others do. That’s a shame. Below are a few of the embedded charts in the Free Press dashboard.
The Detroit News also uses Flourish embeds as well as an Open Street Map of all the counties and their case and death counts, created on Carto, another visualization tool that has been around for a long time. Although I like the county-by-county view, this overall map placed at the top of the Coronavirus coverage page lacks an overall state tally, which would help it.
There’s also a supercool custom visual that shows number of cases over time by county. I’ve included a screenshot below because I don’t see a way to embed the actual visualization. Whereas the Free Press created one dashboard page, the Detroit News visuals are viewed within the Coronavirus section front that links to all the coverage.
The dashboards created by media organizations are only as good as the information they can get from the state. Michigan’s Coronavirus website has a lot of data, and is keeping it updated pretty well. There are detailed spreadsheets, for example, of PPE on hand and beds available for all the hospitals. There’s also an easy-to-view map of areas of the state and risk levels, along with positivity percentages (number of positive tests based on total COVID tests performed) by day. There’s also a searchable list of places to get tested, based on location.
County health departments are keeping very detailed numbers, so county sites are also good resources. My home county, Oakland County, provides a map with zip code specific numbers, toggled between recent and overall cases, as well as an easy-to-view dashboard.
The Guardian, New York Times (of course)
One of my favorite dashboard/pages is from The Guardian, which has always done data so well, and for years provided an open-source data center that was maintained and fed by some of the best data journalists in the world. The data center and data blog have mostly been integrated into other coverage, but The Guardian’s long history with presenting data shows in its smart presentation of its Coronavirus data.
An easy to use overall map toggles between three views, and then “cards” for states allow for viewing stats at a glance, with the opportunity to click for a full view, which takes the reader to the Covid Tracking Project page for that state. It’s all there, with deeper data supporting it from Johns Hopkins and the Covid Tracking Project, two of the best sources of data from states. The Guardian, because its worldwide, does a great job of providing a world view as well.
One of the more obvious and excellent data resources is The New York Times. Because it’s so obvious, I elected to list it lower down, choosing to highlight some of the others that people may not be aware exist. Of course, the NYT presents its data in a way that is easy to view and digest, drawing from the best resources, including Johns Hopkins and the CRC.
The New York Times has vast resources, and is using them to cull and present the data in as much detail as possible. The media giant also is making all county level data available to researchers or other entities via Github.
I love the hot spots map the Times features, and how easy it is to see the entire country or to zero in on a particular state or county. I view this map almost daily because of how easy it is to get a picture at a glance.
Educating yourself on COVID-19
One last thing: If you are interested in learning more about the virus and the data behind it, Johns Hopkins has an online course, Understanding the COVID-19 Pandemic: Insights from Johns Hopkins University Experts, that consists of a series of video modules. Knowledge is power!
Because this virus is upending our lives and the world in real time it’s important to view all the data that way, too, and realize that the massive number of scientists and researchers are working full steam ahead to figure this out. Some of what they find will be proven wrong or debated. But the bigger picture is that it’s a living, moving story we are witnessing, and the only way to understand it is to absorb as much as possible and realize that tomorrow every assumption we have made may be different than the ones we hold today.