Here, we will talk about the Big Data. So, what is Big Data? Why is it important? What is Big Data’s history?
What Exactly is Big Data?
As its name suggests, Big Data refers to data that has a very high volume of records. This data cannot be processed using old methods. Big Data is a term used to describe large, diverse sets of structured, semi-structured, and unstructured data that have the potential to be analyzed for insights and business value using advanced analytics techniques.
The definition of Big Data contains some features that called by “Five V’s of Big Data“. They are volume, velocity, variety, veracity, and value:
- Volume: Refers to the sheer scale of data generated, collected, and stored. Here, it is hard to handle enormous amounts of data from various sources (sensors, social media, business transactions).
- Velocity: The speed at which data is generated, processed, and analyzed in real-time or near-real-time. Here you cannot efficiently handle rapid data influx easily.
- Variety: Represents the diversity of data formats and types (structured, semi-structured, unstructured). It takes time to manage different data sources and formats.
- Veracity: Refers to the accuracy and reliability of data. We have to ensure trustworthy information in this part at the same time.
- Value: The most critical “V” from a business perspective. In this part, extracting meaningful insights leads to effective operations, customer relationships, and business benefits.
Put simply, Big Data is very big and has a complex structure. Good usage of Big Data helps solving business problems. In addition, Big Data has some benefits like:
- Big Data makes it possible for you to gain more complete answers because you have more information.
- More complete answers mean more confidence in the data -which means a completely different approach to tackling problems.
What is history of Big Data?
Big Data, the powerhouse behind modern analytics and machine learning, has a rich history that stretches back centuries. From humble beginnings in the 17th century to the data-driven era we find ourselves in today, here you will explore the captivating timeline of Big Data and its profound impact on the world.
- 1663: John Graunt and Statistical Data Analysis: London haberdasher John Graunt pioneers statistical data analysis during the bubonic plague, recording death rates and laying the groundwork for data-driven insights.
- 1865: Richard Millar Devens and Business Intelligence: Devens introduces the term “business intelligence,” emphasizing the use of data for actionable insights, marking a crucial step towards the data-centric mindset we embrace today.
- 1884: Herman Hollerith and Data Processing: Hollerith’s punch card tabulating machine, used in the 1890 U.S. Census, marks the birth of data processing and sets the stage for the future Computing-Tabulating-Recording Company (IBM).
- 1926: Nikola Tesla’s Vision: Tesla predicts a future where humans access vast amounts of data through handheld devices, showcasing an early understanding of wireless technology’s impact.
- 1928: Fritz Pfleumer and Information Storage: Pfleumer invents a method to store information on tape, laying the foundation for magnetic tape and subsequent technological developments.
- 1943: The Colossus and WWII Code Breaking: The U.K. creates the Colossus, a theoretical computer and data processing machine, to decipher Nazi codes during World War II, showcasing early large-scale data analysis.
- 1959: Arthur Samuel and Machine Learning: IBM programmer Arthur Samuel coins the term “machine learning,” setting the stage for future advancements in artificial intelligence.
- 1965-1969: Building Blocks of the Internet: Plans for the first data center buildings, creation of ARPANET, and the development of distributed control and TCI/IP protocols lay the groundwork for the internet age.
…The Internet Age: The Dawn of Big Data
- 1989-1990: Tim Berners-Lee and the World Wide Web: Tim Berners-Lee and Robert Cailliau establish the World Wide Web, introducing HTML, URLs, and HTTP, ushering in the era of widespread and accessible data.
- 1996: Digital Data Storage Advancements: Digital data storage becomes more cost-effective than paper storage, marking a pivotal moment in the transition to digital information.
- 1998: NoSQL Development: Carlo Strozzi develops NoSQL, an open-source relational database, providing an alternative to traditional tabular methods in relational databases.
…Big Data in the 21st Century
- 2001: Doug Laney and the 3V’s of Big Data: Gartner’s Doug Laney introduces the 3V’s (volume, variety, velocity), defining the dimensions and properties of Big Data and shaping its trajectory in the 21st century.
- 2005-2006: Apache Hadoop and Cloud Computing: Doug Cutting and Mike Cafarella create Apache Hadoop, while Amazon Web Services (AWS) starts offering web-based computing infrastructure services, leading to the dominance of cloud computing.
- 2008-2014: The Proliferation of Big Data: A surge in data processing capabilities, business intelligence prioritization, and the emergence of data scientists mark this period, culminating in a $10 billion global market for Big Data by 2013.
- 2017-2020: Forecasting Growth: IDC forecasts the Big Data analytics market to reach $203 billion in 2020, with Allied Market Research reporting a $193.14 billion market in 2019.
Let’s talk about types of datas!
In the realm of data, nuances in structure define its usability. This post offers brief insights into structured, semi-structured, and unstructured data, elucidating their characteristics and applications.
- Structured Data: Structured data is systematically organized in a formatted repository, often a relational database. Stored in rows and columns, it boasts relational keys and predefined fields, ensuring efficient analysis. Example: Relational Data.
- Semi-Structured Data: Semi-structured data maintains some organization but doesn’t adhere to a rigid structure. While it may challenge relational databases, its flexibility is valuable. Example: XML Data.
- Unstructured Data: Unstructured data lacks predefined structures and models, making it unsuitable for traditional relational databases. Widely used, it includes Word, PDF, Text, and Media Logs, offering versatility in data handling.
- Social Media Data: Data from tweets, posts, images, videos and other user-generated content on social platforms like Facebook, Twitter, Instagram, YouTube, Reddit etc. Includes text, hashtags, geotags, timestamps, network connections, demographics, sentiments, and more. Valuable for social listening, trend analysis, targeted marketing, understanding customer behavior and perceptions.
- Device Data: Log data from computers, mobile devices, IoT endpoints capturing user behavior patterns, geo-location trails, system events etc. Telemetry data with device sensor readings – motion, accelerometers, temperature, noise levels etc. Enables usage analytics, predictive maintenance, context-aware experiences, and personalization.
- Sensor Data: Data measurements from IIoT (Industrial IoT) sensors, instruments, meters, detectors, imaging units across infrastructure. Includes metrics like temperature, pressure, vibrations, power levels, Particle sizes, loyalty card transactions etc. Facilitates predictive analytics for alerting, diagnostic analysis, prescribing actions.
- Transaction Data: Data from purchase transactions, stock trades, banking activity, credit card payments, commutes etc. Captures transaction time, location, value, quantity, payment mode, identities, demographics etc. Essential for marketing, fraud analytics, loyalty programs, operations optimization.

Why does Big Data matter?
- Valuable Insights: Imagine a treasure trove of information waiting to be explored. Big Data analysis reveals patterns, trends, and correlations. Businesses gain insights into customer behavior, market dynamics, and operational efficiency.
- Informed Decision-Making: Organizations armed with data-driven insights make better decisions. Supply chains optimize routes, healthcare providers predict disease outbreaks, and marketers tailor campaigns.
- Driving Innovation: Big Data fuels innovation. It’s the secret sauce behind personalized recommendations, predictive analytics, and automation. Think of Netflix suggesting your next binge-worthy show or self-driving cars navigating traffic.
- Efficiency and Productivity: Efficiently managing Big Data streamlines processes. Predictive maintenance prevents equipment failures, reducing downtime. Cost savings and productivity gains follow suit.
- Customer Experiences: Ever wondered how Amazon recommends products you didn’t even know you needed? Big Data enables personalized experiences, targeted advertising, and seamless interactions.
…Real-World Applications
- Healthcare: Big Data aids in disease prediction, drug discovery, and patient outcomes. Wearable devices collect health data, while AI analyzes medical images.
- Finance: Stock market predictions, fraud detection, and credit risk assessment rely on Big Data algorithms.
- Smart Cities: Urban planning, traffic management, and energy optimization benefit from real-time data.
- Climate Change: Monitoring environmental changes and predicting natural disasters require Big Data analytics.
Big Data Challenges
While promising, managing and extracting maximum value from Big Data comes with a unique set of challenges. The volume, velocity and variety of Big Data make capturing, storing, and processing it difficult. Petabytes of real-time data streaming in from sensors, devices, social platforms, transactions and other sources need to be handled efficiently. Analysts also face the uphill task of cleaning large, complex datasets, ensuring quality, dealing with missing values, and maintaining data integrity for reliable analytics. Beyond warehousing, crunching numbers and basic reporting, deriving contextual insights often requires more advanced math and statistical modeling, multivariate analysis, predictive analytics, machine learning, and data visualization. Another persistent challenge is upholding security protocols and standards to safeguard sensitive consumer data while adhering to evolving privacy laws and ethics. Maintaining individual privacy is paramount, especially with granular personal data. Additionally, the results of Big Data analytics are only useful if interpreted judiciously. By understanding these multifaceted challenges, organizations can proactively address them to fully leverage Big Data for competitive advantage. The availability of skilled talent also remains a restraint industry-wide. However, robust Big Data strategies, governance frameworks, and investments in personnel and tools can set companies up for success.
Here is a brief overview of some key Big Data tools and techniques:
- Hadoop: Open-source framework for storing and processing huge datasets across clusters of computers using simple programming models. Enables distributed storage and analysis. Components include HDFS, MapReduce and YARN.
- MapReduce: A Hadoop algorithm and programming paradigm that “maps” data-processing tasks across nodes and “reduces” the results into aggregated outputs. Allows for parallel processing.
- Spark: An open-source processing engine built for speed, ease-of-use and sophisticated analytics. Performs in-memory computation for faster performance. Supports SQL, streaming data, machine learning and graph algorithms.
- Machine Learning: Algorithms that can “learn” patterns and insights from data automatically without explicit programming. Useful for prediction, classification and insight discovery in Big Data pipelines.
- Data Lakes: Centralized repositories holding vast amounts of raw data in native formats until needed for application. Supports multiple analytics techniques and users for flexible Business Intelligence.
- NoSQL Databases: Non-relational database systems capable of handling large, diverse, unstructured datasets in flexible schemas. Examples: HBase, Cassandra, MongoDB.
- Data Pipelines: Automated flows for effective data ingestion, preparation, integration and transformation to produce quality, analytics-ready Big Datasets.
Here, we reached the bottom part of this Big Data reading. I hope that you enjoyed it!
Bibliography:
- “What is Big Data?” Oracle. February 2024. [https://www.oracle.com/big-data/what-is-big-data/]
- “What Are the 5 V’s of Big Data?” Teradata Glossary. February 2024. [https://www.teradata.com/glossary/what-are-the-5-v-s-of-big-data]
- “The Essence of 5 Vs in Big Data.” CloudThat Resources Blog. February 2024. [https://www.cloudthat.com/resources/blog/the-essence-of-5-vs-in-big-data]
- “The 5 Vs of Big Data.” Global Tech Council. February 2024. [https://www.globaltechcouncil.org/big-data/the-5-vs-of-big-data/]
- “5 Vs of Big Data.” Scaler Topics. February 2024. [https://www.scaler.com/topics/5-vs-of-big-data/]
- “A History and Timeline of Big Data.” TechTarget. February 2024. [https://www.techtarget.com/whatis/feature/A-history-and-timeline-of-big-data]
- “Difference Between Structured, Semi-Structured, and Unstructured Data.” GeeksforGeeks. February 2024. [https://www.geeksforgeeks.org/difference-between-structured-semi-structured-and-unstructured-data/]

Leave a comment