Introduction to Big Data


Systems or Organizations or Applications were using all Structured Data only ( Structured Data means In the form of Rows and Columns). It was very easy to use Relational Data Bases (RDBMS) and old Tools to store, manage, process and report this Data.


Big data is a popular technology used to describe how to maintain huge data and availability of data, both structured data and unstructured data. And big data is high-level structure data to important to business – and society – as the Internet has become. Hadoop is a 100% open or free source and pioneered a fundamentally new way of storing and processing data. Instead of relying on expensive data, proper hardware and different systems to store data and process data, Hadoop enables distributed parallel processing of huge amounts of data across inexpensive, industry-standard servers that both store and process the data, and can handle without limits. With Hadoop, no data is too big to maintain. And in today’s world where data more and more data is being compromised every day from the internet, Hadoop’s breakthrough advantages mean that businesses and organizations can now find value in data that was recently considered useless. The students would get to work on a Real Life Project on Big Data Analytics and gain hands-on projects.


Day-1 (Session 1: BigData) 

-How Big is this Big Data ?
-Definition with Real Time Examples
-How BigData is generated with Real Time Generation
-Use of BigData-How Industry is utilizing BigData
-Traditional Data Processing Technologies
-Future of BigData!!! 

Day-1 (Session 2: Hadoop) 

-Why Hadoop?
-What is Hadoop?
-Hadoop vs RDBMS, Hadoop vs BigData
-Brief history of Hadoop
-Apache Hadoop Architecture
-Problems with traditional large-scale systems
-Requirements for a new approach
-anatomy of a Hadoop cluster
-Hadoop Setup and Installation 

Day-1 (Session 3: Hadoop Ecosystem)

-Brief Introduction about Hadoop EcoSystem (MapReduce, HDFS, Hive, PIG, HBase).

Day-2 (Session 4: HDFS)

-Concepts & Architecture
-Data Flow (File Read , File Write)
-Fault Tolerance
-Shell Commands
-Java Base API
-Data Flow Archives
-Data Integrity
-Role of Secondary NameNode
-HDFS Programming Basics

Day-2 (Session 5: MapReduce)

-MapReduce Architecture
-Data Flow (Map – Shuffle - Reduce)
-MapRed vs MapReduce APIs
-MapReduce Programming Basics
-Programming [ Mapper, Reducer, Combiner, Partitioner ]

Day-2 (Session 6: HIVE & PIG)

-Hive vs RDBMS
-Partitioning & Bucketing
-Hive Web Interface
-Why Pig
-Use case of Pig

Day-2 (Session 7: HBase)

-HBase Introduction


Big Data Engineer

Big data engineers are the professionals who are responsible for building the designs made by solution architects. It is one of the well-known and demanding big data careers. It is the most common role in the big data world. A big data engineer is the one who develops, tests maintains, and manages big data solutions in the organizations.

Data Scientist

One of the most sought-after among available big data careers, the data scientists are the people who use the analytical and technical skills for the extraction of insights from the big data. They turn the structured and unstructured data into meaningful insights.

Big Data Analyst

This is the most sought-after role in the big data field, and the talent is usually scarce for this. The big data analyst requires the fundamental knowledge of big data technologies like Hadoop, Hive, Pig, etc.

Data Visualization Developer

This role involves the responsibilities of designing, developing, and supporting data visualization activities. The data visualization developers are responsible to design, conceptualize, and develop graphic or data visualizations. They have strong technical skills for using relevant technologies to implement the visualization.

Business Analytics Specialist

This role is for the special expert in the field of business analytics who helps in the development of scripts, to test scripts, and perform testing. They even take up business research activities to understand and analyze the issues to develop cost-effective solutions.



  • Product Development – Companies like Netflix and Procter & Gamble use big data to anticipate customer demand.
  • Customer Experience – The race for customers is on. A clearer view of customer experience is more possible now than ever before.
  • Fraud & Compliance – When it comes to security, it’s not just a few rogue hackers—you’re up against entire expert teams.
  • Operational Efficiency – Operational efficiency may not always make the news, but it’s an area in which big data is having the most impact. 
  • Drive Innovation – Big data can help you innovate by studying interdependence among humans, institutions, entities, and process and then determining new ways to use those insights.