MLB Simple Questions
I am a Baseballaholic that happens to work with SAP BI and BW since a while ago. While enjoying baseball games with my mates there is always the question ‘who holds the record for…?’ Sometimes is an easy answer. Every baseball fan out there knows that Pete Rose holds the record for most hits in MLB and Barry Bonds holds the record for most HR. However what player born in Ireland has the most HR in MLB is not common knowledge.
This is why I decided to create something practical for my DataGeek Challenge’s entry. I decided to build a small board in SAP Lumira that will help me to answers those keep-showing-up questions about what player born in a specific country holds the record for some of MLB batting stats and season.
In order to create my board I needed a MLB dataset. I got it from Sean Lahman. His Baseball Archive web site was one the earliest sources for baseball information on the internet. Sean headed the first significant effort to make a database of baseball statistics freely available to the general public.
For the purposes of creating my board I used SAP Lumira Cloud, as it is fast-to-develop and easy-to-create kind of tool. No installations required and content can be shared in a blink of an eye. I am aware that the Cloud Free version of Lumira does not have all the features included in the Desktop version. Nevertheless, it should be more than sufficient for my little board.
First thing I notice was that it is a good idea to check the measures and dimensions automatically detected by Lumira. In my case Lumira classified fields holding ‘Year’ values as measures; I changed them back to Dimensions. Shift fields between measures and dimensions according to your needs before clicking on the blue Acquire data set button.
As soon as I loaded my dataset I was able to create the first of my visualizations, a table ranking players based on Hits stats. My dataset is big enough to run into the maximum numbers of data points to visualize.
In this case I am not after a complete list of players but the top ones. So I defined a rank on top of my measure:
Now I got my first top 5 Hitters:
Hold On, Ken Griffey is not the player with the most hits in the MLB, it is Pete Rose!
What happen? I am using the Player’s name dimension, this is not unique. Ken Griffey played major league baseball from 1973 – 1991 and his son, named after him, also played from 1989 – 2010. So their stats are being aggregated, that explains why they are the top of the list. Frank Thomas appearing in the list is due to the same reason. I have to add the Player unique ID to the output in order to obtain the correct ranking:
I wanted to hide the player ID from the list but I couldn’t. Can it be done in SAP Lumira Desktop version? Leave a comment if you happen to know.
I created three other similar rankings for Home Runs, Runners Batted In and Stolen Bases. Being able to replicate this visualization, then change the measure and redefine the ranking for the new measure speeded up the board creation process.
Next step will be to compose my visualizations into a board with a couple of input controls that will drive the rankings. It is quite straight forward to arrange the visualizations and input controls. All what is needed is to select and drop them where you would like them to be located. It is possible to change the visualization titles, font and colour. Even add a background picture for your board. My board now look like this:
Final step is to share your board:
This is the link where you can try my simple board.
Give it a go and find out what player born in Australia has stolen more bases in the major League Baseball.