Using import.io to extract any data from the web into Lumira (shuffle up and deal edition)
Update: If you want to take a look at how to push this data into HANA directly, take a look at my other import.io blog here
I am a huge Poker fan. I played so many on and offline games in my life that I just can’t even remember my first. What I do remember though is the first time I saw the WSOP on television. The world series of Poker is the game of games, the big daddy of them all and I plan on winning it once. Coincidentally it runs at the same time as the World Soccer Championship so I can’t make it this year (poor excuse ofcourse, nobody wants to couch up my 10k entry fee ;-).
To at least do my part for the WSOP 2014 edition, I thought it would be fun to load up Lumira with WSOP stats (sort of what Nic did with his Soccer Analysis) and see if I could make a nice storyboard which I could share with the world.
Well I did and here it is:
You can use the following URL to play with the data yourself:
Now the interesting part of this blog is not that Phil Hellmuth has the most bracelets and “in the moneys”, but doesn’t come close to Antonio Esfandiari’s life earnings, the interesting part is what I did to get the data!
Import.io, Lumira’s new best friend
Getting data could be a challenge. Ofcourse you can find an online csv data set or use an API to get your data, but sometimes you see some great data on a website and need to find a way to extract it into a file. This is where import.io comes in. It can extract data from any website and has a ton of features to share your newly gathered data. I suggest you take a look at their website to get an idea:
They are truly awesome and their service is free!
Downloading stats from the WSOP website
In my example I used import.io to get the stats from the WSOP website. The process is very simple using the import.io wizard.
Looking at the WSOP page, we can see that we have pages and pages of stats. Would be very clumsy to go through every single page and copy/paste the information into a spreadsheet. This is where import.io proves its usefulness.
After installing the software you can start your extraction process. As you can see there are three options when extracting the data:
In one of my first attempts I used “Crawler” but it took a lot of time to get the data, as import.io is literally crawling through the entire website to collect the data. As I know exactly where my data is, I can use “Extractor”.
Enter your URL and press “lets get cracking”
I choose “Extractor”:
Press the big shiny button and the wizard investigates my page:
My data is in a table so I select table:
Import.io recognized my table and neatly creates my file layout and extracts the data found on the first page:
Note that I skip my first row of data as import.io is smart enough to find the column headings by investigating the HTML. After pressing “I’ve got what I need” I’m done and ready to load my settings to the cloud
As my data is on multiple pages (and I didn’t use “crawl” to extract my data!), I need to enter the URL of the pages where the data is:
After pressing refresh I have all I need!
I can now press download on the top of the page to download the data in a variety of formats. I choose csv.
The file can easily be loaded in Lumira and much to my surprise I have even more data then was on the page itself:
As the images of the players countries are labeled with the country name, I can even create a world map of the data automatically in Lumira. Life is good 😉
Don’t forget to load your data into the Lumira Cloud. Visual discoveries are a nice way to get some insights from your data automatically.
And guess what… Lumira also found out that Antonio Esfandiairi’s earnings are extraordinary:
Lumira doesn’t explain whyso I’ll let you in on the “secret”:
Antonio paid a hefty 1 million to enter the “one drop” game and won a staggering 18.3 milllion! Guess he liked the odds.
I know who to contact to get my 10k buyin for next years WSOP 😉
Thank you for reading and take care,