What is Big Data, Exactly?

Today, the world is interconnected by Internet based applications and devices. People consume more data than ever. The amount of data generated and consumed by businesses, social networks and machines are multiplying every year. Data is no longer measured in megabytes. It is measured in petabytes these days. The exponential growth of data, its usage and applications has led the industry experts to start using the term big data. It is interesting to find the differences between regular data and big data. Traditional data is organized using models such as relational model whereas big data is unstructured and cannot be defined using traditional database models.

Sources of Data

Where do we get the vast amounts of data these days? These data originate from social media companies, banks, healthcare companies and retailers, among other industries. Regardless of whether data is immediately useful or not, companies have started to accumulate all available data. This data may be used for analytical purposes to help the company uncover new insights. Most data generated from sources such as social media are unstructured and cannot be consumed for analysis without the help of sophisticated tools and resources. The sources for big data are increasing daily, as many new devices are connected to Internet. For example, an automobile with connected apps gathers tons of data, which is stored by the car manufacturers.

Characteristics of Big Data

There are 3 characteristics that define big data. They are volume, velocity and variety, also knows as the 3Vs. Volume refers to the increasing amount of data that is available today from a growing number of sources. Velocity refers to the speed with which data is generated today. For example the amount of data generated via Twitter and Facebook during a national sporting event are huge. A global event such as a terrorist attack or plane crash also generates tremendous amounts of data within a short span of time. The third V is variety, which refers to the types of data available such as personnel records, tweets, messages and machine generated data.

Reason for Big Data Explosion

The proliferation of connected devices and companies making innovative products are leading the explosion of big data. The vast amounts of data produced today require significant amount of hardware and software to manage the data. The major reason for the interest in big data is that many companies hope to extract valuable insights from the data. These insights will help the companies gain a competitive advantage over its competitors. Most of the technology companies such as Google, Amazon and Netflix are built on the capturing and analysis of big data. These companies use scientists, data analysts and advanced software tools to manage, analyze and derive useful information from the data.

Tools to Manage Big Data

It is not possible to handle large amounts of unstructured data using traditional methods and tools. Many companies are using cloud-based architectures to handle big data projects and initiatives. Hadoop and Map-Reduce are the main open source tools that are widely used by organizations to manage big-data. Hadoop is used to facilitate parallel processing of data on hundreds of servers. Hadoop is designed to handle multiple types of data such as text, video and flat files. MapReduce is the programming counterpart for Hadoop. MapReduce provides the ability to sort and filter large amounts of data.

Applications of Big Data

Big data and its applications are being used in many industries. These include information technology, healthcare, aerospace, home improvement, finance, and many more. Governments use massive amounts of data to analyze security threats and other patterns. Most companies use big data to analyze and predict consumer behaviors in the short and long run.

Future of Big Data

The amount of data, sources of new data and applications of big-data will continue to increase exponentially. As the world becomes more and more connected, the services, tools and devices of the future will consume and generate tons of data. The challenge will be to successfully manage the data and find information that is valuable to businesses and customers.


