When we talk about futuristic technology, we cannot ignore the term internet of things. The concept of an interconnected ecosystem of thousands of devices communicating with each other is amusing and breathtaking. This very technology is present in modern-day system. Consider an example of an interconnected thermostat that can read the temperature of your home and based on the predictions of weather report it can control the room temperature to either cooler or warmer.
Huge amount of unstructured data in the form of files such as images, satellite captures, audio, and video generated by wireless sensor networks is required to be stored for analysis. Due to the distinctive nature of the data generated by IoT, the relational database may not be the efficient solution.
Storage, capturing analysis transfer with big data
Large sets of unstructured data require a great amount of effort to store, analyze and process it.
As traditional RDBMS is great for structured data but are not meant for semi-structured or unstructured data. The traditional RDBMS cannot handle the massive volume and heterogeneity of big data. Therefore, it necessitated a solution that can process and store all these data at a very large scale. Data scientists and developers faced challenges to develop a solution that can overcome the shortcomings of RDBMS. People were looking for alternatives and then NO SQL database came into existence. One such database is Cassandra; the creators of Cassandra used Amazon’s highly available Dynamo and Google’s big table data models. It helped them to power the inbox search feature of Facebook that involved the processing of huge unstructured data.
Data models: Pre-Defined schemas good or bad
The majority of the data available today is unstructured data and accounts for 80% compared to structured data. This creates a problem with the relational database system, as it calls for specifying a schema before we can add any type of data. For example, if you are storing data about customers such as names, phone numbers then you have to tell the RDBMS system in advance, what you are storing. No SQL databases are better at it, as No SQL databases eliminate predefined schemas hence allowing the insertion of data without defining schemata. IoT devices and sensors frequently generate a high quantity of unstructured data for further processing. One such example can be the data collected by devices in the form of files if these files eventually make it to the database and become structured data, the problem will arise for storing the original unstructured data for archival uses.
Scalability: No SQL a better and cheaper alternative?
IoT devices and sensors will bring forth a massive amount of heterogeneous and geographically disperse real-time data. This data can include periodic observations of monitored instances or repetition of certain interesting events. Storing this tremendous amount of data can be overwhelming. Scaling-up in traditional RDBMS is expensive, due to the structured nature of these scales vertically and it can increase the operation cost. In order to scale up the database is distributed to multiple servers.
Sharding, which is the procedure of scaling complex data into multiple servers used by SQL databases. However, as RDBMS does not support this ability natively, it increases complexity for the development team. On the other hand, no SQL database can be scaled horizontally and many of these databases have built-in support for sharding and automatic database replication.
Do not get me wrong a conventional database is still a great option for many applications. Even today, we are yet to find a perfect database solution that will satisfy all of our needs. Search for a perfect database may not end anytime soon, as we will continue looking for something better to increase convenience. Until we get a result that satisfies all our need, we have to use what is available to us and spend time in perfecting it.