Skip to Content

Very recently I have attended a seminar on Big Data and how Hadoop is being shown as a solution for doing with oceanic amount of data which is static or perennial stream of data to be analyzed for business strategies to understand the trends and getting edge over competitors or  to break the codes in scientific research.

Hadoop which uses the strategy of bringing the computation to the data instead of transferring the data for computation thus reducing the network delay and further running Map reduce algorithms over the data. Hadoop doesn’t suit for interactive data processing.

In the previous TechEd , heard about SAP HANA as its strategy for  dealing with huge data which is its in-memory analytical appliance which enables real-time analytics. HANA can process both structured , unstructured, machine generated and also Social networking  data.

 Mere realtime examples are analyzing the logs generated by web servers of enterprises and Facebook or Linked-In data which generates hundreds of terabytes of data across world everyday. But this can be analyzed with any other data warehousing tool.

SAP HANA focuses at interactive or realtime analytics whereas Big Data processing technologies like Hadoop which is an open source and deals with petabytes and exabytes and is not realtime based.

My take is SAP customers are more traditional enterprises that have long relationships with the existing ERP platform and appreciate stable upgrade paths, maintenance and support to keep operations running smoothly.

 Customers will be interested in running data analytics against increasing amounts of data stored in both SAP and non-SAP systems. Hence mostly they do not employ researchers or data scientists to experiment with still developing technologies like Hadoop for distributed computing and Big Data analytics.

Finally according to me HANA is not about Big Data but more about quick or fast data for  all its customers to make quicker, better business decisions to stay ahead of the competition and hence puts SAP in a good position to expand its analytics business

To report this post you need to login first.

7 Comments

You must be Logged on to comment or reply to a post.

  1. Daniel Koller
    basically I support your conclusion with two remarks:
    – Hadoop is in the meantime a quite sophisticated solution, (perhaps slightly more than HANA ๐Ÿ˜‰

    – But the idea behind Hadoop is also more to be seen as a generic toolset for all kinds of evaluations/tasks which can pe paralleized, but it  requires significant knowledge and experience on the side of the user.

    – Toolkits on top of Hadoop reduce the effort and some of the config hassles needed before getting it to run: Cascading, cascading.org (on java basis is one of them)

    – A quite good combination of Hadoop and HANA could e.g. mean to use HANA hardware to run Hadoop parallelized tasks on top.

    Daniel

    (0) 
    1. Witalij Rudnicki
      Hi Daniel,

      I left unsure after SAPPHIRE keynote what SAP wants to do with this regards. Hasso showed Map/reduce on the roadmap of HANA features without getting into details. When I tried to talk to another SAP guy about what was meant by integration of Map/reduce or with Hadoop, the answer was “Hasso is not SAP employee”. Well, now I am lost…

      -Vitaliy

      (0) 
      1. Daniel Koller
        If Hasso showed it on the roadmap, he might have some idea: perhaps the direction is to enable execution of map-reduce kind of tasks on the hana appliance.

        Additionally SAP may change execution of standard functions in the SAP environment (e.g. expensive reports, queries) in the background to include hana computing power. –> in this case the user would not see the details of it.

        But this does not change my initial assessment, that doing a new task via map-reduce requires a significant amount of thinking through and implementation work to make sense of the specific infrastructure, which is available.

        (0) 
  2. Priya Ranjan
    HANA is not only for analytics, it’s for real time compute as it has aggregation engine built in and SQL compliant. What this means that using HANA businesses can take “real time decisions” based on complex computation which otherwise would take hours/days. For example ATP checks, complex real time product pricing, complex financial simulation, customer analysis, product lifecycle and value chain analysis etc…

    Hadoop is distributed high performance batch processing engine and cannot be used for making real time decisions, but can do extremely complex computation over tons of data but not in real time
    Hbase would be a more suitable comparison.

    Both of them have different strength, but in enterprise a split second advantage is what HANA will offer.

    (0) 
    1. Aditya Varma Post author
      Hi Priya,
         If RDBMS is being used for real-time data and the same is being used for analysis on large data to generate reports, It becomes performance intensive.Hadoop is being used to store vast amounts of data and analyze it arbitrarily by exporting the data constantly thus with out impacting the real time systems performance.
      (0) 
  3. Rich Hill
    I’m kind of unsure as to why you’re making the comparison between HANA and Hadoop if you believe that they have two different purposes.  Like you said, HANA is for fast data and Hadoop is for Big Data.  They’re really different use cases a lot of the time.  Hadoop won’t be the right solution for many enterprise customers, but if you’re generating the amount of data that it’s good for, then you probably have the resources to get a data analyst to work with it.

    I’m hopeful that HANA brings forth some enhancements in performance, but I’m not clear yet on how it differs from traditional data grids (Grid Gain or Oracle Coherence) or in memory databases (Hyper-Sonic SQL)

    (0) 
    1. Aditya Varma Post author
      Dear Hill,
                As mentioned, eventhough both of the technologies can be mapped to different use cases they again deal with huge amounts of data and we are looking for faster computation either in HANA or with Hadoop(using Parallel processing).

      Thanks,
      Aditya

      (0) 

Leave a Reply