What is Big Data?
Big Data consists of large datasets that cannot be managed efficiently by common database management systems. These datasets range from terabytes to exabytes. Mobile phones, credit cards, Radio Frequency Identification (RFID) devices, and social networking platforms create huge amounts of data that may reside unutilised at unknown servers for many years.
However, with the evolution of Big Data, this data can be accessed and analysed on a regular basis to generate useful information.
Table of Content
Elements of Big Data
The seven elements of Big Data almost perfectly define the true Big Data attributes and sum it up as an effective yet extremely straightforward solution for those datasets that require dealing with incredibly plumped-up information.
While deliberating Big Data volumes, incredible sizes and numerical terms are required. Each day, data of 2.5 quintillion bytes is produced.
The speed at which data is accumulated, generated and analysed is considered vital to have more responsive, accurate and profitable solutions.
Beyond the massive volumes and data velocities lies another challenge, i.e., operating on the vast variety of data. Seen as a whole, these datasets are incomprehensible without any finite or defined structure.
A single word can have multiple meanings. Newer trends are created and older ones are discarded over time – the same goes for meanings as well. Big Data’s limitless variability poses a unique decipher challenge if its full potential is to be realised.
What Big Data tells you and what the data tells you are two different situations. If the data being analysed is incomplete or inaccurate, the Big Data solution will be erroneous. This situation occurs when data streams have a variety of formats. The veracity of the overall analysis and effort is useless without cleaning up the data it begins with.
Another daunting task for a Big Data system is to represent the immense scale of information it processes into something easily comprehensible and actionable. For human purposes, the best methods are conversion into graphical formats like charts, graphs, diagrams, etc.
Big Data offers an excellent value to those who can actually play and tame it on its scale and unlock the true knowledge. It also offers newer and effective methods putting new products to their true value even in formerly unknown market and demands.
While Velocity, Volume and Variety are inherent itself to Big Data, the other Vs of Variability, Value, Veracity and Visualisation are important properties that reflect the gigantic complexity that Big Data presents to those who would analyse, process and benefit from it.
Evolution of Big Data
Big Data is the new stage of data evolution directed by the enormous Velocity, Variety, and Volume of data.
The advent of IT, the Internet, and globalization have facilitated increased volumes of data and information generation at an exponential rate, which has led to an “information explosion.” This, in turn, fueled the evolution of Big Data that started in the 1940s and continues till date. Table 1.1 lists some of the major milestones in the evolution of Big Data.
Table: Some Major Milestones in the Evolution of Big Data
|1940s||An American librarian speculated the potential shortfall of shelves and cataloging staff, realising the rapid increase in information and limited storage.|
|1960s||Automatic Data Compression was published in the Communications of the ACM. It states that the explosion of information in the past few years makes it necessary that requirements for storing information should be minimised. The paper described ‘Automatic Data Compression’ as a complete automatic and fast three-part compressor that can be used for any kind of information in order to reduce the slow external storage requirements and increase the rate of transmission from a computer system.|
|1970s||In Japan, the Ministry of Posts and Telecommunications initiated a project to study information flow in order to track the volume of information circulating in the country.|
|1980s||A research project was started by the Hungarian Central Statistics Office to account for the country’s information industry. It measured the volume of information in bits.|
|1990s||Digital storage systems became more economical than paper storage. Challenges related to the amount of data and the presence of obsolete data became apparent. |
Some papers that discussed this concern are as follows:
Michael Lesk published How much information is there in the world?
John R. Masey presented a paper titled Big Data and the Next Wave of InfraStress.
K.G. Coffman and Andrew Odlyzko published The Size and Growth Rate of the Internet.
Steve Bryson, David Kenwright, Michael Cox, David Ellsworth, and Robert Haimes published Visually Exploring Gigabyte Datasets in Real Time.
|2000 onwards||Many researchers and scientists published papers raising similar concerns and discussing ways to solve them. |
Various methods were introduced to streamline information.
Techniques for controlling the Volume, Velocity, and Variety of data emerged, thus introducing 3D data management.
A study was carried out in order to estimate the new and original information created and stored worldwide in four types of physical media: paper, film, optical media, and magnetic media.
This table is only a synopsis of the evolution. The idea of Big Data began when a librarian speculated the need for more storage shelves for books as explained in Table 1, and with time, Big Data has grown into a cultural, technological, and scholarly phenomenon. The generation of Big Data, and with it new storage and processing solutions equipped to handle this information, helped businesses to:
- Enhance and streamline existing databases
- Add insight to existing opportunities
- Explore and exploit new opportunities
- Provide faster access to information
- Allow storage of large volumes of information
- Allow faster crunching of data for better insights
Career in Analytics
Now that you know that Big Data is really BIG in today’s world, you can well understand that so are the opportunities associated with it. The market today needs plenty of talented and qualified people who can use their expertise to help organisations deal with Big Data.
Qualified and experienced Big Data professionals must have a blend of technical expertise, creative and analytical thinking, and communication skills to be able to effectively collate, clean, analyse, and present information extracted from Big Data.
Most jobs in Big Data are from companies that can be categorised into the following four broad buckets:
- Big Data technology drivers, e.g. Google, IBM, Salesforce
- Big Data product companies, e.g. Oracle
- Big Data services companies, e.g. EMC
- Big Data analytics companies, e.g. Splunk
Companies such as Google, Salesforce, and Apple offer various types of opportunities to Big Data professionals. These companies deal into various domains such as retail, manufacturing, information, finance, and consumer electronics.
The most common job titles in Big Data include:
- Big Data analyst
- Data scientist
- Big Data developer
- Big Data administrator
- Big Data Engineer
Skills Required for Big Data
Big Data professionals can have various educational backgrounds, such as econometrics, physics, biostatistics, computer science, applied mathematics, or engineering. Data scientists mostly possess a master’s degree or PhD because it is a senior position and often achieved after considerable experience in dealing with data. Developers generally prefer implementing Big Data by using Hadoop and its components.
A Big Data analyst should possess the following technical skills:
- Understanding of Hadoop ecosystem components, such as HDFS, MapReduce, Pig, Hive, etc. Knowledge of natural language processing
- Knowledge of statistical analysis and analytical tools
- Knowledge of machine learning
- Knowledge of conceptual and predictive modeling
A Big Data developer should possess the following skills:
- Programming skills in Java, Hadoop, Hive, HBase, and HQL
- Understanding of HDFS and MapReduce
- Knowledge of ZooKeeper, Flume, and Sqoop
These skills can be acquired with proper training and practice. This book familiarises you with the technical skills required by a Big Data analyst and Big Data developer.
Organisations look for professionals who possess good logical and analytical skills, with good communication skills and an affinity toward strategic business thinking. The preferred soft skills requirements for a Big Data professional are:
- Strong written and verbal communication skills
- Analytical ability
- Basic understanding of how a business works