Data Geek Challenge – Analyzing Causes.com petition data
I’ve got a campaign going on causes.com that includes a petition I ask people to sign. Causes allows me to download the list of people signing the petition, their country, zip code, telephone number (if provided) and the date they signed the petition in comma delimited text format. I wanted to analyze the data to see what countries I am getting the greatest response from. I’m part of a group that is putting together regional user group meetings and I wanted to make sure that we had them in all the appropriate locations.
The first step once I have the data from causes.com is to create a New Document in SAP Lumira and select the CSV option.
And then select the CSV file I got from causes.
At this point, Lumira begins the process of acquiring the data. It displays a sample of the data from the file and allows you to select which columns you actually want to use as well as customize how the acquisition is done. The only thing I changed was that I deselected the columns I wasn’t interested in, as I was primarily just looking for the names and country information. Note that the blurring in the image below I added to protect Personally Identifiable Information. The actual dialog was quite readable.
What I want to do is display this on a world map, so I need to get Lumira to recognize the country data as geographic information. To do that, I right click on the column header and select “Create a geographic hierarchy” and then “By Names” since I’m working with country names.
Lumira made an initial guess as to what data fit into the Country / Region / Sub Region / City hierarchy (just the Country data), so I only needed to click OK to continue:
Of the 83 countries from which causes reported signatures, Lumira recognized 81 of them. For the remaining two, Christmas Island and Hong Kong, I was prompted to choose a country to associated them to. For Hong Kong, the only options it presented me with were “Not Found” and “Hong Kong – China”, the latter of which was appropriate and I selected. However, for Christmas Island, the only options it presented me with were “Not Found” and “Iceland”.
I couldn’t find a method in the short amount of time I spend on this that would allow me to select a better alternative. I know that Christmas Island is a territory of Australia, which is about as far away from Iceland as you can get. So I went with Not Found. If Lumira doesn’t have an option that would allow me to select a better option, I see that as a deficiency in the product. I may not have the option in the case of remote data (not a CSV file) of changing the source data, so I need to handle the mapping within Lumira.
Lumira now shows me my original country column and the Geographic Country hierarchy I’ve created. The next step is to create a measure to graph against that country hierarchy. To do that, I right click on the name column (or click the down arrow in it’s header) and select the “Create a measure” option.
The measure that Lumira created by default was Count Distinct, which is exactly what I wanted. Otherwise, there are options available on the newly created measure to change how it is determined.
I’m now ready to graph the data. I hit the “Visualize” button in the toolbar, and then select the Geographic Visualization option in the list of visualization types. Since I want what I call a “heat map”, I select the “Geo Choropleth Chart” within the Geographic types. Then I drag the Geography_Country Hierarchy into the Geography Dimension and the Names_Count(Distinct) measure into the Measures.
This looks pretty close to what I want. I don’t particularly like different shades of the same color for different ranges though. So I right click on each of the colors in the legend in the chart, and an options box appears that lets me choose the color I want to use for that particular range.
I went with blue, green, yellow, orange and red for each of the ranges, with white indicating no signatures. I then right clicked in the graph and selected the “Show Data Labels” option so that the value of the measure would appear in parenthesis after the country name. Finally, I modified the name of the graph to something more meaningful. The final result is here:
That was all fairly simple, and I was quite pleased with what I was able to do with very little work. We’re planning one regional user group meeting in Canada, four in the United States, one in Central America, two in South America, three in Europe and two in SouthEast Asia. Based on the heat map from the petition, it looks like we’ve got them in all of the right locations.
For the next step, I’m using Googles’ GeoCoding service to convert the zip code information for signers from the United States into lattitude and longitude so that I can do another more refined heat map just for the USA.