In New England, we’ve been under a stay-at-home advisory for almost two months as a result of the COVID-19 pandemic. Like many of us, we seen all of the statistics and we’ll gotten worried for our safety and for the safety of our loved ones. If you’re like me, we’ve been looking for signs of hope. Let’s see if we can see any hope in the data.
Challenges with Analyzing COVID-19 Data
If you’ve been following the COVID-19 pandemic closely, you’ve probably been bombarded with a flurry of statistics. These can be difficult to understand, misleading, or too high-level to understand its significance. What this means is that a full understanding of the true impact of COVID-19 is often left to the “experts” for the following reasons:
* Data presented is always pre-aggregated and summarized. While it’s easy to answer the pre-conceived questions, like “have the cases gone up or down”, it’s difficult to answer new questions or to correlate difficult questions together.
* Data is siloed and disconnected. It is easy to see a specific set of statistics (e.g. new cases, deaths, or testing), but it’s very difficult to see (with the data) the impact that new prevented PPE measures have had on case volumes.
* All tabular data with little visuals. Many media sites allow you to sort and pivot, but few provide the ability to visualize and spots trends and outliers and truly ask-and-answer your own questions.
Enter SAP Data Warehouse Cloud
For those of you not familiar, SAP Data Warehouse Cloud is SAP’s new enterprise class cloud Data Warehousing solution. It’s your one stop solution for all your business needs – providing you with data integration, data modeling, and analytics – all in one place. So instead of doing all of my usual data wrangling in other places, I figured that this would give me a single solution for everything that I needed to do. Ping me if you’re interesting in learning more.
Here’s what I did.
Step 1: Getting Access to the Data
Like all analytics projects, this is the hardest part. Fortunately, the State of Massachusetts publishes a lot of data, but it comes from a variety of place, like CDC, Mass.gov, the Mass Department of Public Health, and Google Mobility. I pulled 14 different data sources that all seemed interesting to help answer some of these questions. Now I just need to make sense of this data.
Step 2: Load it into Data Warehouse Cloud
The next thing that I did was create my own virtual workspace for my analytics. While this is just a small project, there is the potential to add others to collaborate and have them bring and create their own insights. I pulled all 14 different data sets into my virtual workspace to have readily available for my colleagues to use and enhance. Data Warehouse Cloud have a nifty little feature that automatically profiles and parses my data so that I don’t need to manually define data types. Here’s what my raw tables looked like.
Step 3: Model and Join All of These Disparate Sources Together
While most wouldn’t view COVID-19 as having a complex data model, it does indeed present many real-life data challenges that most organizations face. My data set has 14 different tables that do not neatly “join” together. For example, there are cases that are tracked by date and county, there are hospitalizations that are tracked by hospitals, there are long-term care facilities that are tracked by hospitals, and so forth. What time meant is that there wasn’t always a clean way to join everything together. Additionally, there are multiple many-to-many joins. For example, counties have many cases, cases spread across many places, and many places have different demographics. Anyways, without getting too techie, I was able to easily handle many of these situations and create a nice clean data model – that looked like this:
Step 4: Ask-and-Answer My Own Questions
For me, this is the fun part. I had so many questions in the data and I wanted to be able to easily answer them myself. So, here’s a look at some of the questions that I was able to address.
Let’s see what the data tells us…
Question 1: What is the daily count of new and total COVID-19 cases?
While Massachusetts has seen a huge spike in cases, over the past two weeks, there’s been a nice downwards trend of fewer and fewer new reported cases. Good sign!
We’re also testing a lot. Over the past month, the number of tests have steadily climbed and the number of people testing positive has steadily decreased as well. Another good sign!
While the death toll has gone down over the past two weeks, we’re still averaging around 130 per day. This is a lagging indicator – as deaths can arise well after someone has been diagnosed – so we should see this number continue to decrease as the number of cases have decreased.
Question #2: Who’s Really Impacted the Most By This?
This is where things got more interesting for me. While I can’t claim that I learned anything new, it did allow me to drill down into more detail and use the data to come to these conclusions.
* Demographics. COVID-19 affects everyone. All ages, all ethnicity, all genders, and all counties. No one is immune.
* Hospitalizations and Deaths: While COVID-19 affects everyone in Massachusetts across all ages and, the number of hospitalizations and deaths are much more skewed. 72% of all hospitalizations are for people over 70 years old and 86% are for people over 60. And 86% of all deaths are for people over 70 and 96% are for people over 60.
* Pre-existing conditions. The figure below shows that only 0.8% of those deaths were from people that were not known to have any pre-existing conditions and 11% were from people that were not known to be previously hospitalized.
* Long Term Care Facilities: For our most vulnerable, you can see that only around 20% of the population test positive, but the death rate (from long-term care compared to the total deaths in Mass) hover around 60%.
What the data is telling me (that most people already know) is that the virus primarily affects seniors with pre-existing conditions.
Question 3: Do We Have Adequate Medical Supplies to Handle This?
The Massachusetts department of public health posts the availability of hospital beds. Fortunately, all of the ICUs in each county (except Northeast Mass) have available ICU units. And all of the alternate medical sites have an abundance of beds. For surgical needs, many of the hospitals are reporting to be far over capacity.
Question 4: Have We Distributed Enough PPE For Our Front-Line Workers?
Since the start of COVID-19, over 4M masks (regular and N95) have been distributed and most of this has gone to state agencies, municipal worked, and senior living homes.
Question 5: What Are People Doing During This Stay-at-Home Order?
For the rest of us, what are we doing? Google tracks people’s movements. And through their new google mobility report, you can track the increase/decrease of specific areas. For example, Park visits across Massachusetts has increase by 38% and residential visits (home to home) has gone up by 23%. This means that most people are going to the park or visiting friends and family.
What does all of this mean?
While I was able to very easily ask and answer lots of questions in the data, my answers were not very eye-opening. But nonetheless, here’s what I found:
* Massachusetts has flattened the curve. New cases have been steadily are going down while testing has steadily increased.
* COVID-19 affects everyone. All ages, all counties, genders, and all ethnicity, but the elderly population is the most affected by this and those with pre-existing conditions.
* Massachusetts has equipped our front-line workers by distributing plenty of PPE.
* And during this stay-at-home order, most of us have been going to park and to visiting friends.
Just like in your organization, data is everywhere. However, the true analytics is very often left to too few. This is another fun example of how data and analytics can be applied to help understand Massachusetts’ response to COVID-19.
Want a demo? Use the comments below to get in touch and I’d be happy to share more details.