Skip to Content

Population Health refers to the health outcomes of a group of individuals, including the distribution of such outcomes within the group¹. Both clinical and financial outcomes can be improved by applying big data models and analytical applications to large sets of patient data including data about their facility stays, specific procedures & services, physician care, and costs.

The goals of these applications can include:

  • Improving patient care and encounter experiences (close care gaps)
  • Reducing per capita cost of care and services (including cost savings for providers)
  • Improving the health of populations (better health outcomes, disease prevention, prolonged life, better experiences with health providers and facilities)

We will use SAP Vora to take a look at two different population health use cases.

Data Management Overview

Given a set of patient-oriented health data files, we’ll use SAP Vora to load, explore, analyze, and visualize our data by looking at two different healthcare population use cases.

More precisely, we will load data from a Hadoop Distributed File System (HDFS) into SAP Vora and create persistent, in-memory data models and data views with the aid of the SAP Vora Modeler. We will leverage multiple data sources to process the data using the SAP Vora Relational Engine, SAP Vora Graph Engine, and the SAP Vora Document Store (Collection). Consequently, we’ll gain meaningful insights from the data using the full suite of SAP Vora Tools for discovery, modeling, and visualization.

The figure above depicts the flow of data. I’ve used this graphic before, but I like it, so I’m going to keep using it. But in this case, I have updated the graphic to emphasize that we are using different types of raw data (CSV, JSG, JSON) from different sources and leveraging multiple SAP Vora engines. And, while Zeppelin provides some basic level of visualization and SAP Vora supports a number of different advanced visualization tools (SAP Lumira, Tableau, and other compatible applications), we are using the native capabilities in SAP Vora for the visualizations in this blog. The red dotted outlines in the diagram show the areas of focus discussed in this blog.

Use Case Overview

Case 1: Orthopedic Procedure Services & Costs for a Multi-Facility Patient Population

Our first healthcare population use case takes a look at orthopedic procedures (primarily knee or hip replacement) by examining patient encounter data for these procedures and their associated services. The data collection covers roughly two years of encounters and includes cost, facility, and physician information for each encounter as well as specific services (e.g., pharmaceutical, therapeutic, surgery, diagnostics, etc.) and their individual respective costs.

Information Available:

  • Patient Encounters (stays) as defined by an admit date and a discharge date.
  • Patient Demographics useful in identifying profile patterns.
  • Encounter Facts relating to Procedure, Facility, Physician, and Service (and further dissected by service department).
  • Service Costs measured against Procedure, Facility, and Physician using the Encounters to aggregate the Services.

Implementation Approach:

  • Construct data models and views to represents the facts and dimensions of the data.
  • Use the SAP Vora Relational Engine to explore and query the data.
  • Use the SAP Vora Modeler to build persistent Views utilizing multiple data sources, aggregation, and filtering.
  • Use the SAP Vora Data Browser to configure visualizations to create on-demand charts and exports.

Discovery & Analysis Ideas:

  • What are the pharmaceutical costs associated with these types of procedures?
  • Which Knee Systems have the highest cost?
  • What services are most common at each facility?
  • Which facility has the highest average cost per encounter? Per service?
  • Compare average and total cost of services or encounter by Physician.
  • Examine correlations with length-of-stay to Facility/Physician.

For this first use case, we have a collection of related data in comma-separated files so the first step is to sketch out a data model which will help us identify the entity abstractions and relationships between them.

After loading those raw files into Hadoop, we can then create Vora tables specifying the SAP Vora Relational Engine as our data source type. Creating the Vora tables can be accomplished in a number of ways including using the Spark Scala shell, a Zeppelin Notebook, the Vora SQL Editor, or the Vora Modeler.

Once these tables have been loaded into Vora’s in-memory engine, we can immediately begin building Views using the Modeler drawing from one or more tables as data sources. The Views allow us to combine tables, add calculated values, aggregate the data, and filter the data. If the resultant View proves valuable, it may be kept as a report and will persist as long as its data sources remain intact because SAP Vora persists table metadata in its own catalog.

A good approach when creating Views is to ask a question and then, if possible from the data, build a View which will help answer the question.

Question: Which physicians have the highest average costs per encounter?

To answer this question, we’ll join the ENCOUNTER and PHYSICIAN models and use an average aggregation against the per encounter COST value.


We can use the Data Preview directly from the Modeler to chart our data.

TIP: The data preview in the Modeler (version 1.4.14) is limited to the top 1000 records. To browse and chart your data without limit, use the Data Browser instead.

View Model:

Report Title
: Average Cost of Encounter by Physician

The Data Preview pane allows us to configure charts to visualize our data. Here, we look at bar and area charts showing average encounter cost (Y Axis) by physician (X Axis).

Takeaways: The lowest average cost fell just below $7,100. The typical cost of an encounter falls in the 10K-12K range. Three physicians averaged above $15,000 per encounter. So, we may want to take a deeper look into the costs incurred for these three physicians to better understand why their costs were so much higher than typical. It may be equally important to understand how four physicians were able to keep the encounter costs under 8K.

Question: What is the average cost of Surgery for a Total Knee Replacement procedure?

To answer this question, we’ll join the ENCOUNTER, SERVICE, and PROCEDURE models and use an average aggregation against the service COST value while filtering the services and procedures associated specifically with ‘Total knee replacement’. We’ll further refine the scope of the result to look only at service charges directly associated with the surgery (e.g., filtering out pharmaceutical and therapeutic charges).


View Model:

Generated SQL


TIP: You use the ‘Show SQL’ toolbar button in the Modeler to examine the generated SQL that is sent to the Spark SQL query processor and executed in the current session’s Spark SQLContext.

Report Title: Average Service Cost for Total Knee Replacement System


Average service cost for a full knee replacement system typically falls in the 3K to 3.5K range.

Just six outliers cost above 4K and two outliers cost below 3K. This first, quick look at the data helps us zero in on those outliers which may deserve greater attention.

Question: What are the highest average service costs across all patient encounters for all types of orthopedic surgical procedures?

To answer this question, we can create a simple view based on the SERVICE model which includes a Calculated Column to compute the average service cost and groups service costs by their charge department. We can then apply a “Having” clause (or aggregate filter) to extract only the top 10 services (by cost).

Data Source: HC_SERVICE

We’ll then use the Data Browser and configure a pie chart for the new View.

Chart Configuration Using the Data Browser:

Filter: AVERAGE_COST > 200

Report Title: Highest Average Service Cost by Charge Department (Top 10)


Of the ten highest average cost services, Surgery accounts for only 6% of those costs while Intensive Care and Stepdown Unit (intermediate level care, “a step down” from ICU) account for close to 30% of the overall service costs.

But if we remove those services which are atypical for this type of surgery (over $450) as well as those services under $100, we see that Surgery, Recovery Room, and Interventional Radiology account for the bulk of service costs for these encounters.

This is not unexpected, but it is helpful in validating the data so that when we do find outliers, they will not reflect on the data set as a whole.

Question: Can we compare specific pharmaceutical service costs for similar drugs against facility and/or physician?


View Model:

While we can use this view to look at any number of pharma service costs (and collate by Facility or Physician), we will focus our visualization on comparing facility costs of the two primary pharma services (drugs) used to prevent or reduce bleeding. Further, we will filter the results to focus only on Total Knee Replacement procedures.

Chart Configuration Using the Data Browser:

Y-Axis: TOTAL_COST (aggregate across all encounters), SERVICE_DESCRIPTION
Filters: service description = ‘CYKLOKAPRON 10MG VL’ OR service description = ‘TRANEXMC ACID100MG/ML10ML’)

Title: Facility Comparison of Pharma Service Costs


Based on cost, it is clear that Facility 2 & 3 prefer using branded Tranexamic acid (Cyklokapron) and Facility 5 & 6 prefer using generic Tranexamic acid. We can draw more insight by looking at the average service cost (to verify which is less expensive and by how much) and by pulling in the physicians to determine if this is due to Physician preference more so than Facility.


Case 2: Diabetes Risk Factor Elements for a Diverse Patient Population

Our second healthcare population use case takes a look at a number of diabetes risk factor elements and maps them to specific patient demographic profiles.

Information Available:

  • Diabetes risk factor indicators (e.g., LDL, BMI, Hypertension, Obesity, Smoking) as recorded for roughly 10,000 patients.
  • Patient demographics (e.g., Age, Sex, Race, Location).

Implementation Approach:

  • Construct a graph, mapping unique demographic combinations into distinct patient profile nodes. And assigning each patient (connecting an edge from) a profile node.
  • Use the SAP Vora Graph Engine to explore and query the data.
  • Load patient demographics from JSON files into the SAP Vora Document Store.
  • Construct cross-engine view models to allow for drill downs from patient graph nodes to detailed patient demographic information.

Discovery & Analysis Ideas:

  • Identify patient profile clusters with the graph visualization.
  • The more dense clusters may provide an indicator of the type of patients who are most susceptible to the ailments associated with Diabetes.
  • Use the Vora Data Browser to show:
    • Profile (center node) details associated with each cluster.
    • Patient risk factor details for individual Patient nodes (connected nodes).
  • Look for common risk factors in each profile cluster.

For this second use case, we look at a homogeneous set of Diabetes Patient data which comes in the form of a one more JSON files. The data includes patient demographics (e.g., Age, Sex, Race, Location) as well as diabetes risk factor attributes (e.g., LDL, BMI, Hypertensive, Obesity, Smoker).

We will use the SAP Vora Graph Engine to help us identify specific patient profiles with respect to discrete combinations of diabetes risk factors. Then, we’ll use the SAP Vora Document Store to load the full patient demographic data which can be linked to the patient nodes on our graph. We’ll also see how you to add new JSON files to our existing data in Vora as they become available.

We have mapped risk factor elements to specific patient demographic profiles and then extracted the resultant data into JSG format representing a Graph. This graph may then be loaded into SAP Vora.

As illustrated above, we used an Excel Pivot Table to aggregate six specific risk factor elements across all patient records grouped by demographic characteristics. This can be transformed into Graph data with two Node Types: Patient and Profile. The Profile nodes will include a subset of the demographic data as its properties. The Patient nodes will include a set of diabetes risk factor elements as its properties. Edges for Patient nodes will link to specific Profile nodes.

Once we have built the JSON Graph (JSG) file and placed it into Hadoop, we can create a Graph in Vora.

Report Title: Patient Profile Clusters for Diabetes Risk Elements


Graph Summary View – Examine Node Types, Edges, and Node Details

Graph Visualization
Profile Cluster Center Node Showing Demographic Attributes

Graph Visualization
Individual Patient Leaf Node Showing Risk Factor Attributes


The clusters shown in the graph represent the different patient profiles. The more dense clusters may provide an indicator of the type of patients who are more likely to exhibit particular risk factor elements associated with Diabetes.

In the Vora Data Browser, clicking on the center of a cluster will show the Profile details associated with that cluster. And clicking on the individual Patient nodes connected to the center node in the cluster, will show the Patient risk factor details.

This graph may be used to look for common risk factors in each profile cluster.

Adding Patient Data Using the JSON Document Store

To augment our Graph data, we can load the full patient records which include both the demographic and risk element information captured by the Graph as well as additional data points that were excluded from the Graph. This will allow us to “drill down” or further aggregate the data by creating Views in Vora which leverage data sources from multiple engines.

The initial JSON file is added to Vora by creating a new Collection and choosing the Document Store Engine type.

TIP: Vora will interpret a line feed as a new record, so you will want to flatten your JSON prior to putting the file in the HDFS.

When another set of patient data becomes available, you don’t need to create a new Collection, you can simply choose to “Append Data Files” from the Modeler.

You can now employ the Vora Modeler to create new Views using both the Graph and Document Collection as data sources as well as any relational data sources; these views will be your pathway to gaining greater insights into your data.

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply