I am one of them who has much passion on Space specially all things above earth, Since Childhood space related stuffs are always occupy my night dreams. This was kind of visiting space (always without spacecrafts or space suits), meeting aliens ( very beautiful purple women ), big stars (with shape of star), shooting stars. Its kind of exploring the unknown, unlimited. Above all I don’t have any chance on getting to know the technology behind these research. Even I could some time wish to become Space scientists. My limitation ends with watching Space related movies, get the latest news of newly invented planets.
Its SAP Lumira lights up my dreams in a very practical way, This is going to be my “dream” project.
With the help of my contacts I have come to know about NASA open data challenge, there are bulk of NASA research data available for public to develop better product which helps all the BIG data challenges with NASA. I with the help of 2 friends ( non SAP ) we have been searching for the best and simple data source to fit for our analysis, my friends pushed me to choose climate data, but my passion on space not allowed me to think anything other than the space science data. but we are zero in space related terms. We were keep on thinking of data for 3 months without any action. Because open NASA data are huge with all the formats. I am sure even very big data geeks also can’t work on these data without the help of field experts who has interpret these data.
But its US shutdown helps much, during the shutdown time, I got replied from NASA scientists whom I mailed 3 months before, helped me to choose some correct track of data. They are even encouraged me to participate their NASA big data challenge 🙂 , below are my data source
True Challenge is not with the huge data, its all about getting things, predictive, analytic results from the data. For getting result we must give some dedicated time for study up the data. Gone through the data sets and make sure familiar with the terms before start to analyse. I give a very short description of what my data about and how it used for analysis.
Kepler is a space instrument kind of telescope launched by NASA to discover Earth-like planets orbiting other stars. Its uses various kind of discovery methods to observe the other planets. The size, distance, velocity etc can be obtained. All the discovered particles from the device not considered as real planet, some time it ends with false positive or some time it was eclipse binaries. For more information on Kepler data please visit http://en.wikipedia.org/wiki/Kepler_spacecraft
We created our own excel sheet data set from the meta data from the above site using various features like merge, number conversion, date and time conversion.
These are few way, I used to get some very meaningful analytic results, but using these sample methods, we can get 100+ visualisation results out of the data.
1. Impact of distance on Density and Mass ( using Scatter Graph)
This is very interesting, Its clearly shown distance gives big impact on Mass.
2. Year of Invention
Its very true, After the Kepler launch, the number of invention of new planets are dramatically increased.
And also last couple years Kepler helped to discover more number of planets, if we look at the reason behind that, there might be lot, technical improvement on discovery methods used on Kepler and other way round we can try to relate with other constraints like gravity. This is what data scientists work for. Trying to get the various possibilities from the already available data.
3. Top 5 Photometric measurements
I used SAP Lumira’s Raking functionality to get the top 5, out of 10000+ entries
4. Number of False Positives
All the inventions of the Kepler not considered as Planets, some time it ends with False Positives its kind of other objects like eclipse binaries. Out of all the measurement with in less than a second I could get the list of confirmed planets.
5. Highest Temperature
Using Tag cloud, we could see the highest Temperature Candidate
6. Density ( Close look!!! )
This chart clearly gives very detail comparison of Candidate density with Earth . If we closely look our earth is only 3.72% only.
7. Mass and Radius variation on Earth and Jupiter using Story Board
This is four dimensional data, Hence I used combined line and column chart with Storyboard to bring very meaningful visual composition.
8. Discovery Methods and Proper Motion comparison
The below helps the Kepler team to optimize identification methods
9. Number of General, Light, Transit Curves
This is one of the very sensitive analysis and mostly used in identification of false positives
10. Mass and Radius on Earth Detail Scatter Matrix
I thank everyone including NASA guys who helped me for this wonderful work. Without you all support, this was not possible. The above is just the beginning, I have very much looking on estimation of next new planet invention analysis using predictive, might be I share those information in future.