Okay Data Geeks,
Weather data has always fascinated me. If I had my way, I’d have my own weather station on top of my house collecting petabytes of weather data, and a HANA server in my basement to crunch it all with. But I don’t. So instead I have an entry into the Data Geek Challenge.
First I went in search of some historical weather data for Philadelphia, because, hey, it’s home.
I found some a temperature data archive here. You can look up your city if you are interested, too!
I was surprised to find it relatively up-to-date. The data ended in July of 2013. That’s pretty current as far as free data sets go.
I downloaded the Philly data, which was in a space-delimited text file. Dirty.
I pulled in the data into SAP Predictive Analysis, and used the data manipulation tools on the right tab to clear out leading and trailing spaces, to merge my Year and Average Temperature columns that got acquired as two separate columns because of those leading and trailing spaces. Here’s what they looked like before I merged them:
Next slapped some dandy semantic enrichment on my dates.
Now for the fun part. I popped open the Predict design tab, and dropped in a R Double Exponential Smoothing transform, since I’m looking to do a time series prediction. Here’s how I configured it:
I wanted it to predict the trend, and the Dependent variable is the Average Temperature. I had it use Month as the period, because I want to know a prediction by month, and I wanted it to start at 2013 and go forward since the 2013 data was incomplete in the data set.
On to visualize, and boom:
So according to my prediction (don’t bet the farm on this, folks!), the average temperature in Philadelphia really isn’t going to change all that much.
What could make this better? An absolutely HUGE data set would be much better. This data was only from 1995 to July of 2013. That’s much too small a sample to really predict accurately.
What do you think?