SlideShare a Scribd company logo
Big Data Analytics
Unit-6
Big Data Analytics
• Data Analytics: Data analytics technologies and techniques give
organizations a way to analyze data sets and gather new information.
Business intelligence (BI) queries answer basic questions about business
operations and performance.
• Big data analytics is the often complex process of examining big data to
uncover information -- such as hidden patterns, correlations, market
trends and customer preferences -- that can help organizations make
informed business decisions.
• Big data analytics is a form of advanced analytics, which involve complex
applications with elements such as predictive models, statistical
algorithms and what-if analysis powered by analytics systems.
Importance of Big Data Analytics
• Organizations can use big data analytics systems and software to
make data-driven decisions that can improve business-related
outcomes. The benefits may include more effective marketing, new
revenue opportunities, customer personalization and improved
operational efficiency. With an effective strategy, these benefits can
provide competitive advantages over rivals.
Working of Big Data Analytics
• Data professionals collect data from a variety of different sources. Often, it is a mix of
semi-structured and unstructured data. While each organization will use different data
streams, some common sources include:
• internet clickstream data;
• web server logs;
• cloud applications;
• mobile applications;
• social media content;
• text from customer emails and survey responses;
• mobile phone records; and
• machine data captured by sensors connected to the internet of things (IoT).
Contd…
• Data is prepared and processed. After data is collected and stored in a data warehouse or data lake, data
professionals must organize, configure and partition the data properly for analytical queries. Thorough
data preparation and processing makes for higher performance from analytical queries.
• Data is cleansed to improve its quality. Data professionals scrub the data using scripting tools or data
quality software. They look for any errors or inconsistencies, such as duplications or formatting mistakes,
and organize and tidy up the data.
• The collected, processed and cleaned data is analyzed with analytics software. This includes tools for:
• data mining, which sifts through data sets in search of patterns and relationships
• predictive analytics, which builds models to forecast customer behavior and other future actions,
scenarios and trends
• machine learning, which taps various algorithms to analyze large data sets
• deep learning, which is a more advanced offshoot of machine learning
• text mining and statistical analysis software
• artificial intelligence (AI)
• mainstream business intelligence software
• data visualization tools
Big Data Tools
• Hadoop, which is an open source framework for storing and processing big data sets. Hadoop can handle large amounts of structured and
unstructured data.
• Predictive analytics hardware and software, which process large amounts of complex data, and use machine learning and statistical
algorithms to make predictions about future event outcomes. Organizations use predictive analytics tools for fraud detection, marketing,
risk assessment and operations.
• Stream analytics tools, which are used to filter, aggregate and analyze big data that may be stored in many different formats or platforms.
• Distributed storage data, which is replicated, generally on a non-relational database. This can be as a measure against independent node
failures, lost or corrupted big data, or to provide low-latency access.
• NoSQL databases, which are non-relational data management systems that are useful when working with large sets of distributed data.
They do not require a fixed schema, which makes them ideal for raw and unstructured data.
• A data lake is a large storage repository that holds native-format raw data until it is needed. Data lakes use a flat architecture.
• A data warehouse, which is a repository that stores large amounts of data collected by different sources. Data warehouses typically store
data using predefined schemas.
• Knowledge discovery/big data mining tools, which enable businesses to mine large amounts of structured and unstructured big data.
• In-memory data fabric, which distributes large amounts of data across system memory resources. This helps provide low latency for data
access and processing.
• Data virtualization, which enables data access without technical restrictions.
• Data integration software, which enables big data to be streamlined across different platforms, including Apache, Hadoop, MongoDB and
Amazon EMR.
• Data quality software, which cleanses and enriches large data sets.
• Data preprocessing software, which prepares data for further analysis. Data is formatted and unstructured data is cleansed.
• Spark, which is an open source cluster computing framework used for batch and stream data processing.
Applications of Big Data Analytics
• Customer acquisition and retention. Consumer data can help the marketing efforts of companies,
which can act on trends to increase customer satisfaction. For example, personalization engines for
Amazon, Netflix and Spotify can provide improved customer experiences and create customer loyalty.
• Targeted ads. Personalization data from sources such as past purchases, interaction patterns and
product page viewing histories can help generate compelling targeted ad campaigns for users on the
individual level and on a larger scale.
• Product development. Big data analytics can provide insights to inform about product viability,
development decisions, progress measurement and steer improvements in the direction of what fits a
business' customers.
• Price optimization. Retailers may opt for pricing models that use and model data from a variety of
data sources to maximize revenues.
• Supply chain and channel analytics. Predictive analytical models can help with preemptive
replenishment, B2B supplier networks, inventory management, route optimizations and the
notification of potential delays to deliveries.
• Risk management. Big data analytics can identify new risks from data patterns for effective risk
management strategies.
• Improved decision-making. Insights business users extract from relevant data can help organizations
make quicker and better decisions.
Benefits of Big Data Analytics
• The benefits of using big data analytics include:
• Quickly analyzing large amounts of data from different sources, in many
different formats and types.
• Rapidly making better-informed decisions for effective strategizing, which can
benefit and improve the supply chain, operations and other areas of strategic
decision-making.
• Cost savings, which can result from new business process efficiencies and
optimizations.
• A better understanding of customer needs, behavior and sentiment, which can
lead to better marketing insights, as well as provide information for product
development.
• Improved, better informed risk management strategies that draw from large
sample sizes of data.
Data Scientist
• A data scientist uses data to understand and explain the phenomena
around them, and help organizations make better decisions.
• Working as a data scientist can be intellectually challenging,
analytically satisfying, and put you at the forefront of new advances in
technology. Data scientists have become more common and in
demand, as big data continues to be increasingly important to the
way organizations make decisions.
What does a data scientist do?
• Data scientists determine the questions their team should be asking and figure out
how to answer those questions using data. They often develop predictive models for
theorizing and forecasting.
• A data scientist might do the following tasks on a day-to-day basis:
• Find patterns and trends in datasets to uncover insights
• Create algorithms and data models to forecast outcomes
• Use machine learning techniques to improve the quality of data or product offerings
• Communicate recommendations to other teams and senior staff
• Deploy data tools such as Python, R, SAS, or SQL in data analysis
• Stay on top of innovations in the data science field
Data analyst vs data scientist: What’s the difference?
• The work of data analysts and data scientists can seem similar—both find
trends or patterns in data to reveal new ways for organizations to make
better decisions about operations. But data scientists tend to have more
responsibility and are generally considered more senior than data
analysts.
• Data scientists are often expected to form their own questions about the
data, while data analysts might support teams that already have set goals
in mind. A data scientist might also spend more time developing models,
using machine learning, or incorporating advanced programming to find
and analyze data.
Big Data and Data Warehousing
• Big Data: Big Data basically refers to the data which is in large volume and has complex
data sets. This large amount of data can be structured, semi-structured, or non-structured
and cannot be processed by traditional data processing software and databases. Various
operations like analysis, manipulation, changes, etc are performed on data and then it is
used by companies for intelligent decision making. Big data is a very powerful asset in
today’s world. Big data can also be used to tackle business problems by providing intelligent
decision making.
• Data Warehouse: Data Warehouse is basically the collection of data from various
heterogeneous sources. It is the main component of the business intelligence system where
analysis and management of data are done which is further used to improve decision
making. It involves the process of extraction, loading, and transformation for providing the
data for analysis. Data warehouses are also used to perform queries on a large amount of
data. It uses data from various relational databases and application log files.
This is abouts are you doing the same time who is the best person to be safe and
Stream Analytics
• Streaming analytics is the processing and analyzing of data records continuously rather
than in batches. Generally, streaming analytics is useful for the types of data sources
that send data in small sizes (often in kilobytes) in a continuous flow as the data is
generated.
• Streaming analytics may include a wide variety of data sources, such as telemetry
from connected devices, log files generated by customers using web applications,
ecommerce transactions, or information from social networks or geospatial services.
It’s often used for real-time aggregation and correlation, filtering, or sampling.
• Data traditionally is moved in batches. Batch processing often processes large volumes
of data at the same time, with long periods of latency. For example, a process may be
run every 24 hours. While this can be an efficient way to handle large volumes of data,
it doesn’t work with time-sensitive data that’s meant to be streamed, because that
data can be stale by the time it’s processed.
Big Data and Stream Analytics
• Big data streaming is a process in which large streams of real-time
data are processed with the sole aim of extracting insights and useful
trends out of it. A continuous stream of unstructured data is sent for
analysis into memory before storing it onto disk. This happens across
a cluster of servers. Speed matters the most in big data streaming.
The value of data, if not processed quickly, decreases with time.
• Real-time streaming data analysis is a single-pass analysis. Analysts
cannot choose to reanalyze the data once it is streamed.
Applications of Stream Analytics
Streaming analytics is ideal for processing data from sources that continuously
generate small amounts of data. Here are a few examples:
• Credit card fraud detection: Six card brands generated an aggregate of 440.99
billion purchase transactions for goods and services in 2019. To detect and
prevent fraud, card associations, like Visa or MasterCard, must analyze billions
of transactions and trigger alerts based on certain criteria. When it’s set up
properly, a streaming analytics system can facilitate the automation of fraud
detection. Essentially, it does this by first checking to see if any characteristics
of the payment authorization request meet any of the business’s criteria for
what constitutes suspicious activity. If the request is deemed suspicious, the
system can send an automated text to the cardholder asking them to confirm
the transaction.
• Efficient routing of delivery trucks: For logistics companies, efficiently routing trucks is
the entire business. But the most efficient route from point A to point B depends on
constantly changing variables, such as traffic conditions and weather forecasts. Also, in
some cases, trucks are delivering temperature-sensitive supplies, like pharmaceuticals.
Temperature sensors, traffic conditions, and weather forecasts are all sources of
streaming data logistics companies can analyze to make better business decisions. But
you need streaming analytics if you want to analyze the data quickly enough for the data
to be useful. After all, if the alert for an overheated truck comes in too late for the driver
to act on it, the cargo could become completely unusable.
• Personalized customer experiences: If you’ve ever left a conversation and then thought
of the perfect comeback, you understand why streaming analytics is important. Some
insights have to be received at a certain moment—otherwise, they become useless. The
personalized customer experience is a prime example of the need for the timely insights
provided by streaming analytics. With streaming analytics, marketers can automate
highly targeted product recommendations, use machine learning to customize web
experiences, optimize pricing, and more.
• Fraud detection
• Sales and marketing
• Predictive asset management
• Risk management
• Network management and optimization
• Location intelligence
• Supply chain management
• Product innovation and customer management
Important Questions
• Define the different inferences in big data analytics.
• Define the following
a. Intelligent Data Analytics
b.Analysis Vs Reporting
• Describe any five characteristics of Big Data.
a) What is a data stream?
b) Discuss 14 insights of Info sphere in data stream.
• Explain the different applications of data streams in detail.
• Explain the stream model and Data stream management system architecture.
• What are filters in Big Data? Explain Bloom Filter with example
• What is Real Time Analytics? Discuss their technologies in detail
• Explain the three categories of Prediction methodologies.

More Related Content

PPTX
Big data
Srinivasa Reddy
 
PPTX
Data mining
jadhav_priti
 
PPTX
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
PPTX
Introduction to Big Data Analytics
Utkarsh Sharma
 
PPTX
Big data Analytics Unit - CCS334 Syllabus
Sunanthini Rajkumar
 
PPTX
KIT601 Unit I.pptx
LBSIMDS, Lucknow
 
PPTX
Chapter 4 : Introduction to BigData.pptx
bharatgautam204
 
PPTX
Modern Analytics And The Future Of Quality And Performance Excellence
ICFAI Business School
 
Big data
Srinivasa Reddy
 
Data mining
jadhav_priti
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
Introduction to Big Data Analytics
Utkarsh Sharma
 
Big data Analytics Unit - CCS334 Syllabus
Sunanthini Rajkumar
 
KIT601 Unit I.pptx
LBSIMDS, Lucknow
 
Chapter 4 : Introduction to BigData.pptx
bharatgautam204
 
Modern Analytics And The Future Of Quality And Performance Excellence
ICFAI Business School
 

Similar to This is abouts are you doing the same time who is the best person to be safe and (20)

PPTX
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
RATISHKUMAR32
 
PPTX
Trends in data analytics
Ramakrishnan Venkataramanan
 
PDF
Big data Analytics
ShivanandaVSeeri
 
PDF
how to successfully implement a data analytics solution.pdf
basilmph
 
PPTX
Big_Data.pptx
mohamedibrahim946387
 
PDF
Tips --Break Down the Barriers to Better Data Analytics
Abhishek Sood
 
PDF
Lesson_1_definitions_BIG DATA INROSUCTIONUE.pdf
koredemohammed001
 
PPTX
Data Warehousing , Data Mining and BI.pptx
CallplanetsDeveloper
 
PDF
Data Analysis Methods 101 - Turning Raw Data Into Actionable Insights
DataSpace Academy
 
PDF
Business Analytics and Data mining.pdf
ssuser0413ec
 
PPTX
Application_of_big_data_presentation.pptx
HemrajAunund2
 
PPTX
bigdata- Introduction for pg students fo
DharaniMani4
 
PPTX
Introduction to Data Analytics - PPM.pptx
ssuser5cdaa93
 
PPTX
000 introduction to big data analytics 2021
Dendej Sawarnkatat
 
PDF
Data Analytics and Big Data on IoT
Shivam Singh
 
PPTX
bigdata introduction for students pg msc
DharaniMani4
 
PPTX
Big data
26Nia
 
PPTX
2. Business Data Analytics and Technology.pptx
nirmalanr2
 
PDF
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
JerichoGerance
 
PPTX
Group 2 Handling and Processing of big data.pptx
salutiontechnology
 
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
RATISHKUMAR32
 
Trends in data analytics
Ramakrishnan Venkataramanan
 
Big data Analytics
ShivanandaVSeeri
 
how to successfully implement a data analytics solution.pdf
basilmph
 
Big_Data.pptx
mohamedibrahim946387
 
Tips --Break Down the Barriers to Better Data Analytics
Abhishek Sood
 
Lesson_1_definitions_BIG DATA INROSUCTIONUE.pdf
koredemohammed001
 
Data Warehousing , Data Mining and BI.pptx
CallplanetsDeveloper
 
Data Analysis Methods 101 - Turning Raw Data Into Actionable Insights
DataSpace Academy
 
Business Analytics and Data mining.pdf
ssuser0413ec
 
Application_of_big_data_presentation.pptx
HemrajAunund2
 
bigdata- Introduction for pg students fo
DharaniMani4
 
Introduction to Data Analytics - PPM.pptx
ssuser5cdaa93
 
000 introduction to big data analytics 2021
Dendej Sawarnkatat
 
Data Analytics and Big Data on IoT
Shivam Singh
 
bigdata introduction for students pg msc
DharaniMani4
 
Big data
26Nia
 
2. Business Data Analytics and Technology.pptx
nirmalanr2
 
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
JerichoGerance
 
Group 2 Handling and Processing of big data.pptx
salutiontechnology
 
Ad

Recently uploaded (20)

PPTX
INFO8116 - Week 10 - Slides.pptx big data architecture
guddipatel10
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PPTX
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
PPTX
Power BI in Business Intelligence with AI
KPR Institute of Engineering and Technology
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PDF
Company Presentation pada Perusahaan ADB.pdf
didikfahmi
 
PPTX
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPT
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
INFO8116 - Week 10 - Slides.pptx big data architecture
guddipatel10
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
Power BI in Business Intelligence with AI
KPR Institute of Engineering and Technology
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
Company Presentation pada Perusahaan ADB.pdf
didikfahmi
 
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Ad

This is abouts are you doing the same time who is the best person to be safe and

  • 2. Big Data Analytics • Data Analytics: Data analytics technologies and techniques give organizations a way to analyze data sets and gather new information. Business intelligence (BI) queries answer basic questions about business operations and performance. • Big data analytics is the often complex process of examining big data to uncover information -- such as hidden patterns, correlations, market trends and customer preferences -- that can help organizations make informed business decisions. • Big data analytics is a form of advanced analytics, which involve complex applications with elements such as predictive models, statistical algorithms and what-if analysis powered by analytics systems.
  • 3. Importance of Big Data Analytics • Organizations can use big data analytics systems and software to make data-driven decisions that can improve business-related outcomes. The benefits may include more effective marketing, new revenue opportunities, customer personalization and improved operational efficiency. With an effective strategy, these benefits can provide competitive advantages over rivals.
  • 4. Working of Big Data Analytics • Data professionals collect data from a variety of different sources. Often, it is a mix of semi-structured and unstructured data. While each organization will use different data streams, some common sources include: • internet clickstream data; • web server logs; • cloud applications; • mobile applications; • social media content; • text from customer emails and survey responses; • mobile phone records; and • machine data captured by sensors connected to the internet of things (IoT).
  • 5. Contd… • Data is prepared and processed. After data is collected and stored in a data warehouse or data lake, data professionals must organize, configure and partition the data properly for analytical queries. Thorough data preparation and processing makes for higher performance from analytical queries. • Data is cleansed to improve its quality. Data professionals scrub the data using scripting tools or data quality software. They look for any errors or inconsistencies, such as duplications or formatting mistakes, and organize and tidy up the data. • The collected, processed and cleaned data is analyzed with analytics software. This includes tools for: • data mining, which sifts through data sets in search of patterns and relationships • predictive analytics, which builds models to forecast customer behavior and other future actions, scenarios and trends • machine learning, which taps various algorithms to analyze large data sets • deep learning, which is a more advanced offshoot of machine learning • text mining and statistical analysis software • artificial intelligence (AI) • mainstream business intelligence software • data visualization tools
  • 6. Big Data Tools • Hadoop, which is an open source framework for storing and processing big data sets. Hadoop can handle large amounts of structured and unstructured data. • Predictive analytics hardware and software, which process large amounts of complex data, and use machine learning and statistical algorithms to make predictions about future event outcomes. Organizations use predictive analytics tools for fraud detection, marketing, risk assessment and operations. • Stream analytics tools, which are used to filter, aggregate and analyze big data that may be stored in many different formats or platforms. • Distributed storage data, which is replicated, generally on a non-relational database. This can be as a measure against independent node failures, lost or corrupted big data, or to provide low-latency access. • NoSQL databases, which are non-relational data management systems that are useful when working with large sets of distributed data. They do not require a fixed schema, which makes them ideal for raw and unstructured data. • A data lake is a large storage repository that holds native-format raw data until it is needed. Data lakes use a flat architecture. • A data warehouse, which is a repository that stores large amounts of data collected by different sources. Data warehouses typically store data using predefined schemas. • Knowledge discovery/big data mining tools, which enable businesses to mine large amounts of structured and unstructured big data. • In-memory data fabric, which distributes large amounts of data across system memory resources. This helps provide low latency for data access and processing. • Data virtualization, which enables data access without technical restrictions. • Data integration software, which enables big data to be streamlined across different platforms, including Apache, Hadoop, MongoDB and Amazon EMR. • Data quality software, which cleanses and enriches large data sets. • Data preprocessing software, which prepares data for further analysis. Data is formatted and unstructured data is cleansed. • Spark, which is an open source cluster computing framework used for batch and stream data processing.
  • 7. Applications of Big Data Analytics • Customer acquisition and retention. Consumer data can help the marketing efforts of companies, which can act on trends to increase customer satisfaction. For example, personalization engines for Amazon, Netflix and Spotify can provide improved customer experiences and create customer loyalty. • Targeted ads. Personalization data from sources such as past purchases, interaction patterns and product page viewing histories can help generate compelling targeted ad campaigns for users on the individual level and on a larger scale. • Product development. Big data analytics can provide insights to inform about product viability, development decisions, progress measurement and steer improvements in the direction of what fits a business' customers. • Price optimization. Retailers may opt for pricing models that use and model data from a variety of data sources to maximize revenues. • Supply chain and channel analytics. Predictive analytical models can help with preemptive replenishment, B2B supplier networks, inventory management, route optimizations and the notification of potential delays to deliveries. • Risk management. Big data analytics can identify new risks from data patterns for effective risk management strategies. • Improved decision-making. Insights business users extract from relevant data can help organizations make quicker and better decisions.
  • 8. Benefits of Big Data Analytics • The benefits of using big data analytics include: • Quickly analyzing large amounts of data from different sources, in many different formats and types. • Rapidly making better-informed decisions for effective strategizing, which can benefit and improve the supply chain, operations and other areas of strategic decision-making. • Cost savings, which can result from new business process efficiencies and optimizations. • A better understanding of customer needs, behavior and sentiment, which can lead to better marketing insights, as well as provide information for product development. • Improved, better informed risk management strategies that draw from large sample sizes of data.
  • 9. Data Scientist • A data scientist uses data to understand and explain the phenomena around them, and help organizations make better decisions. • Working as a data scientist can be intellectually challenging, analytically satisfying, and put you at the forefront of new advances in technology. Data scientists have become more common and in demand, as big data continues to be increasingly important to the way organizations make decisions.
  • 10. What does a data scientist do? • Data scientists determine the questions their team should be asking and figure out how to answer those questions using data. They often develop predictive models for theorizing and forecasting. • A data scientist might do the following tasks on a day-to-day basis: • Find patterns and trends in datasets to uncover insights • Create algorithms and data models to forecast outcomes • Use machine learning techniques to improve the quality of data or product offerings • Communicate recommendations to other teams and senior staff • Deploy data tools such as Python, R, SAS, or SQL in data analysis • Stay on top of innovations in the data science field
  • 11. Data analyst vs data scientist: What’s the difference? • The work of data analysts and data scientists can seem similar—both find trends or patterns in data to reveal new ways for organizations to make better decisions about operations. But data scientists tend to have more responsibility and are generally considered more senior than data analysts. • Data scientists are often expected to form their own questions about the data, while data analysts might support teams that already have set goals in mind. A data scientist might also spend more time developing models, using machine learning, or incorporating advanced programming to find and analyze data.
  • 12. Big Data and Data Warehousing • Big Data: Big Data basically refers to the data which is in large volume and has complex data sets. This large amount of data can be structured, semi-structured, or non-structured and cannot be processed by traditional data processing software and databases. Various operations like analysis, manipulation, changes, etc are performed on data and then it is used by companies for intelligent decision making. Big data is a very powerful asset in today’s world. Big data can also be used to tackle business problems by providing intelligent decision making. • Data Warehouse: Data Warehouse is basically the collection of data from various heterogeneous sources. It is the main component of the business intelligence system where analysis and management of data are done which is further used to improve decision making. It involves the process of extraction, loading, and transformation for providing the data for analysis. Data warehouses are also used to perform queries on a large amount of data. It uses data from various relational databases and application log files.
  • 14. Stream Analytics • Streaming analytics is the processing and analyzing of data records continuously rather than in batches. Generally, streaming analytics is useful for the types of data sources that send data in small sizes (often in kilobytes) in a continuous flow as the data is generated. • Streaming analytics may include a wide variety of data sources, such as telemetry from connected devices, log files generated by customers using web applications, ecommerce transactions, or information from social networks or geospatial services. It’s often used for real-time aggregation and correlation, filtering, or sampling. • Data traditionally is moved in batches. Batch processing often processes large volumes of data at the same time, with long periods of latency. For example, a process may be run every 24 hours. While this can be an efficient way to handle large volumes of data, it doesn’t work with time-sensitive data that’s meant to be streamed, because that data can be stale by the time it’s processed.
  • 15. Big Data and Stream Analytics • Big data streaming is a process in which large streams of real-time data are processed with the sole aim of extracting insights and useful trends out of it. A continuous stream of unstructured data is sent for analysis into memory before storing it onto disk. This happens across a cluster of servers. Speed matters the most in big data streaming. The value of data, if not processed quickly, decreases with time. • Real-time streaming data analysis is a single-pass analysis. Analysts cannot choose to reanalyze the data once it is streamed.
  • 16. Applications of Stream Analytics Streaming analytics is ideal for processing data from sources that continuously generate small amounts of data. Here are a few examples: • Credit card fraud detection: Six card brands generated an aggregate of 440.99 billion purchase transactions for goods and services in 2019. To detect and prevent fraud, card associations, like Visa or MasterCard, must analyze billions of transactions and trigger alerts based on certain criteria. When it’s set up properly, a streaming analytics system can facilitate the automation of fraud detection. Essentially, it does this by first checking to see if any characteristics of the payment authorization request meet any of the business’s criteria for what constitutes suspicious activity. If the request is deemed suspicious, the system can send an automated text to the cardholder asking them to confirm the transaction.
  • 17. • Efficient routing of delivery trucks: For logistics companies, efficiently routing trucks is the entire business. But the most efficient route from point A to point B depends on constantly changing variables, such as traffic conditions and weather forecasts. Also, in some cases, trucks are delivering temperature-sensitive supplies, like pharmaceuticals. Temperature sensors, traffic conditions, and weather forecasts are all sources of streaming data logistics companies can analyze to make better business decisions. But you need streaming analytics if you want to analyze the data quickly enough for the data to be useful. After all, if the alert for an overheated truck comes in too late for the driver to act on it, the cargo could become completely unusable. • Personalized customer experiences: If you’ve ever left a conversation and then thought of the perfect comeback, you understand why streaming analytics is important. Some insights have to be received at a certain moment—otherwise, they become useless. The personalized customer experience is a prime example of the need for the timely insights provided by streaming analytics. With streaming analytics, marketers can automate highly targeted product recommendations, use machine learning to customize web experiences, optimize pricing, and more.
  • 18. • Fraud detection • Sales and marketing • Predictive asset management • Risk management • Network management and optimization • Location intelligence • Supply chain management • Product innovation and customer management
  • 19. Important Questions • Define the different inferences in big data analytics. • Define the following a. Intelligent Data Analytics b.Analysis Vs Reporting • Describe any five characteristics of Big Data. a) What is a data stream? b) Discuss 14 insights of Info sphere in data stream. • Explain the different applications of data streams in detail. • Explain the stream model and Data stream management system architecture. • What are filters in Big Data? Explain Bloom Filter with example • What is Real Time Analytics? Discuss their technologies in detail • Explain the three categories of Prediction methodologies.