Data Geek III – Where should I send my children (if I had any)?
A few weeks ago, I came across this Data challenge and the free SAP Lumira edition as well as the cloud version. It took me some time to find a good dataset that I would be interested in analyzing.
As I didn’t want to use any of the sample sets (which have been analyzed over and over again), I started surfing the web looking at some other interesting topic and data. After a while, I found some crime related data. Although this was also mentioned and analyzed here before, my dataset dealt with crimes and abuses on university campuses.
For anyone interested in the data and further sets, everything can be downloaded from the website of the U.S. Department of Education (http://www.ope.ed.gov/security/GetDownloadFile.aspx). Interesting here is that the dataset not only contains data from the US but also from all over the world (like Germany, Italy, France, the United Kingdom and so on).
When I imported the data in Lumira, I did some data preparation like grouping the geolocations like this:
So I added geo-hierarchies for the country, state and city. I had some problems when identifying the locations, because some minor cities were included, that’s why I split it that way.
The table I used for this analysis and post contained data about drug, liquor and weapon abuse on campus from 2010, 2011 and 2012. Data about crimes committed on campus are still in my disk and will be analyzed in a further post. I thought that this would be a nice add-on to the existing posts about crime data, this time especially focused on university campuses.
Within this dataset, I also added 6 custom measures:
- Drug usage per student (2012)
- Liquor usage per student (2012)
- Weapon usage per student (2012)
- Increase of weapon usage from 2010 to 2012
- Proportion between drug and weapon usage
- Proportion between liquor and drug usage
As one can see, the proportions are based on calculations which involve custom measures.
So much about preparing the data, let’s start analyzing them.
In total, filtering to the US, I had 61.033.546 results in my data set
This data can be further divided into 26.923.448 men and 34.110.098 woman:
Although I’m not having the final results divided into male and female respondents, having g more than half of the set females leads to some interesting thoughts.
If we look at the drug usage:
We can see that there is a general high usage on campuses in California, New York, Pennsylvania and Texas. Colorado, Michigan and Wisconsin are also quite high. If we want to send our children to a University with very low drug usage, we should have a look at Alaska (freezing cold), DC (quite interesting) or Hawaii (well, I guess the kids would love this one most).
Looking at the usage and average of only the 2012 data:
We can prove the above said: Hawaii would be the best place to send them. And at spring break, we can visit them; they don’t need to come home.
If we look at weapons, California seems to be the most dangerous one:
Maybe the Silicon Valley guys or the Berkeley programmer’s 😉
The overall data:
Shows that besides California and maybe North Carolina and Texas, the usage of weapons (just remember on campus!) is quite low. We also had a decreasing rate from 2010 to 2012 in those states, whereas Tennessee and Colorado increased a bit:
Coming back to drugs again, we could now take a look at proportions like the total amount of students and the amount of drug usage:
Texas in the top right here has a high number of students and compared to that a rather low number of drug usage compare to Colorado for example.
In general we could say the more the bubble is on the right and top, the better the proportion is or the lower the bubble, the less drugs are used.
Having the same data in another form:
This shows us that the average using of drugs per student is most in Wisconsin , Connecticut and South Dakota although those states aren’t the ones with the most usage cases.
One last page shows that the proportion between liquor usage and drug usage is the highest in Minnesota whereas the proportion between drug usage and weapon usage is the highest in West Virginia:
In a really short time, SAP Lumira was able to answer quite a lot of question and help in deciding where to send the children. To recap: Hawaii seems to be best, but I guess that another analysis about crimes and fires will be needed to make a final decision, so stay tuned 🙂
Cheers,
Marc
Interesting analysis, glad not to see NC on the "naughty list" (go Duke!). 🙂 But since these metrics do not always correspond to the quality of education (for example, CA and NY are also home to some high-ranking universities) and don't consider important factors such as affordability, it's also a great example how easy it is sometimes to get misguided by Big Data. 🙂 Good blog nevertheless.
Thanks for your feedback. There are for sure other metrics which should count more when choosing a college. This data does only reflect the known cases, doesn't mean that drug or liquor abuse is like this 🙂
Maybe another analysis based on the Times ranking of World Wide Universities might be interesting, like analyzing which country has the highest ranked ones.
Nice job Marc! I liked the way you created custom measures, and used different charts to look at the data.