SlideShare a Scribd company logo
Presenter:
Date:
TOPIC: AI and DS
Private and Confidential www.futureconnect.net 1
Private and Confidential www.futureconnect.net 2
AGENDA
UNIT
NAME
TOPICS
Hours
Count
Session
1.DATA
SCIENCE
1. DATA SCIENCE LIBARIES
2. NUMPY
3. PANDAS
4. MATPLOTLIB
5. DATA EXPLORATION
2 2
OBJECTIVES
• Gain knowledge of Data Science Libraries
• To understand Data Science Manipulation Packages
• Demo for Data Exploration using Package
3
Private and Confidential www.futureconnect.net 3
Data Mining
Scrapy
• One of the most popular Python data science libraries, Scrapy helps to build crawling programs
(spider bots) that can retrieve structured data from the web – for example, URLs or contact info.
• It's a great tool for scraping data used in, for example, Python machine learning models.
• Developers use it for gathering data from APIs.
BeautifulSoup
• BeautifulSoup is another really popular library for web crawling and data scraping.
• If you want to collect data that’s available on some website but not via a proper CSV or API,
BeautifulSoup can help you scrape it and arrange it into the format you need.
4
Private and Confidential www.futureconnect.net 4
Data Processing and Modeling
NumPy
• NumPy (Numerical Python) is a perfect tool for scientific computing and performing basic and
advanced array operations.
• The library offers many handy features performing operations on n-arrays and matrices in
Python.
SciPy
• This useful library includes modules for linear algebra, integration, optimization, and statistics.
• Its main functionality was built upon NumPy, so its arrays make use of this library.
• SciPy works great for all kinds of scientific programming projects (science, mathematics, and
engineering
5
Private and Confidential www.futureconnect.net 5
Data Processing and Modeling
Pandas
• Pandas is a library created to help developers work with "labeled" and "relational" data intuitively.
• It's based on two main data structures: "Series" (one-dimensional, like a list of items) and "Data
Frames" (two-dimensional, like a table with multiple columns).
Keras
• Keras is a great library for building neural networks and modeling.
• It's very straightforward to use and provides developers with a good degree of extensibility. The
library takes advantage of other packages, (Theano or TensorFlow) as its backends.
6
Private and Confidential www.futureconnect.net 6
Data Processing and Modeling
SciKit-Learn
• This is an industry-standard for data science projects based in Python.
• Scikits is a group of packages in the SciPy Stack that were created for specific functionalities –
for example, image processing. Scikit-learn uses the math operations of SciPy to expose a
concise interface to the most common machine learning algorithms.
PyTorch
• PyTorch is a framework that is perfect for data scientists who want to perform deep learning tasks
easily.
• The tool allows performing tensor computations with GPU acceleration. It's also used for other
tasks – for example, for creating dynamic computational graphs and calculating gradients
automatically.
7
Private and Confidential www.futureconnect.net 7
Data Processing and Modeling
TensorFlow
• TensorFlow is a popular Python framework for machine learning and deep learning, which was
developed at Google Brain.
• It's the best tool for tasks like object identification, speech recognition, and many others.
• It helps in working with artificial neural networks that need to handle multiple data sets.
XGBoost
• This library is used to implement machine learning algorithms under the Gradient Boosting
framework.
• XGBoost is portable, flexible, and efficient.
• It offers parallel tree boosting that helps teams to resolve many data science problems. Another
advantage is that developers can run the same code on major distributed environments such as
Hadoop, SGE, and MPI.
8
Private and Confidential www.futureconnect.net 8
Data Visualization
Matplotlib
• This is a standard data science library that helps to generate data visualizations such as two-
dimensional diagrams and graphs (histograms, scatterplots, non-Cartesian coordinates graphs).
• Matplotlib is one of those plotting libraries that are really useful in data science projects —
it provides an object-oriented API for embedding plots into applications.
• Developers need to write more code than usual while using this library for generating advanced
visualizations.
Seaborn
• Seaborn is based on Matplotlib and serves as a useful Python machine learning tool for
visualizing statistical models – heatmaps and other types of visualizations that summarize data
and depict the overall distributions.
• When using this library, you get to benefit from an extensive gallery of visualizations (including
complex ones like time series, joint plots, and violin diagrams).
9
Private and Confidential www.futureconnect.net 9
Data Visualization
Bokeh
• This library is a great tool for creating interactive and scalable visualizations inside browsers using
JavaScript widgets. Bokeh is fully independent of Matplotlib.
• It focuses on interactivity and presents visualizations through modern browsers – similarly to Data-
Driven Documents (d3.js). It offers a set of graphs, interaction abilities (like linking plots or adding
JavaScript widgets), and styling.
Plotly
• This web-based tool for data visualization that offers many useful out-of-box graphics – you can
find them on the Plot.ly website.
• The library works very well in interactive web applications.
pydot
• This library helps to generate oriented and non-oriented graphs.
• It serves as an interface to Graphviz (written in pure Python). The graphs created come in handy
when you're developing algorithms based on neural networks and decision trees.
10
Private and Confidential www.futureconnect.net 10
Python Libraries for Data Science
• Pandas: Used for structured data operations
• NumPy: Creating Arrays
• Matplotlib: Data Visualization
• Scikit-learn: Machine Learning Operations
• SciPy: Perform Scientific operations
• TensorFlow: Symbolic math library
• BeautifulSoup: Parsing HTML and XML pages
Private and Confidential www.futureconnect.net 11
This 3 Python Libraries will be
covered in the following slides
Numpy
• NumPy=Numerical Python
• Created in 2005 by Travis Oliphant.
• Consist of Array objects and perform array processing.
• NumPy is faster than traditional Python lists as it is stored in one continuous place
in memory.
• The array object in NumPy is called ndarray.
Private and Confidential www.futureconnect.net 12
Top four benefits that NumPy can bring to your code:
1. More speed: NumPy uses algorithms written in C that complete in nanoseconds rather than
seconds.
2. Fewer loops: NumPy helps you to reduce loops and keep from getting tangled up in iteration
indices.
3. Clearer code: Without loops, your code will look more like the equations you’re trying to
calculate.
4. Better quality: There are thousands of contributors working to keep NumPy fast, friendly, and
bug free.
13
Private and Confidential www.futureconnect.net 13
Numpy Installation and Importing
Pre-requirements: Python and Python Package Installer(pip)
Installation: pip install numpy
Import: After installation, import the package by the “import” keyword.
import numpy
This ensures that NumPy package is properly installed and ready to use
Package
Private and Confidential www.futureconnect.net 14
Numpy-ndarray Object
• It defines the collection of items which belong to same type.
• Each element in ndarray is an object of data-type object : dtype
• Basic ndarray creation: numpy.array
OR
numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin =
0)
Array interface Data type Object copying Row/Col major Base class array Number of
or 1D dimensions
Private and Confidential www.futureconnect.net 15
Sample Input-Output
Code:
import numpy as np
a=np.array([1,2,3])
b=np.array([[1,2],[3,4]])
print(a)
print(b)
Output:
[1,2,3]
[[1,2]
[3,4]]
Private and Confidential www.futureconnect.net 16
1D Array
2D Array
NumPy arrays can be multi-dimensional too.
np.array([[1,2,3,4],[5,6,7,8]])
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
• Here, we created a 2-dimensional array of values.
• Note: A matrix is just a rectangular array of numbers with shape N x M where N is
the number of rows and M is the number of columns in the matrix. The one you
just saw above is a 2 x 4 matrix.
17
Private and Confidential www.futureconnect.net 17
Types of NumPy arrays
• Array of zeros
• Array of ones
• Random numbers in ndarrays
• Imatrix in NumPy
• Evenly spaced ndarray
18
Private and Confidential www.futureconnect.net 18
Numpy - Array Indexing and Slicing
• It is used to access array elements by using index element.
• The indexes in NumPy arrays start with 0.
arr = np.array([1, 2, 3, 4])
arr[0] Accessing first element of the array. Hence, the value is 1.
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
arr[0,1] Accessing the second element of the 2D array. Hence, the value is 2.
Slicing: Taking elements of an array from start index to end index [start:end] or [start:step:end]
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5]) Ans: [2 3 4 5]
Private and Confidential www.futureconnect.net 19
Dimensions of NumPy arrays
You can easily determine the number of dimensions or axes of a NumPy array using the ndims attribute:
# number of axis
a = np.array([[5,10,15],[20,25,20]])
print('Array :','n',a)
print('Dimensions :','n',a.ndim)
Array :
[[ 5 10 15]
[20 25 20]]
Dimensions :
2
This array has two dimensions: 2 rows and 3 columns.
20
Private and Confidential www.futureconnect.net 20
Numpy- Array Shape and Reshape
• The shape of an array is the number of data elements in the array.
• It has an attribute called shape to perform the action
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)
• Reshaping is done to change the shape of an array.
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3)
print(newarr)
Output: (2,4)
Output: [[1 2 3]
[4 5 6]
[7 8 9]
[10 11 12]]
Private and Confidential www.futureconnect.net 21
Flattening a NumPy array
Sometimes when you have a multidimensional array and want to collapse it to a single-dimensional
array, you can either use the flatten() method or the ravel() method:
Syntax:
• flatten()
• ravel()
22
Private and Confidential www.futureconnect.net 22
Transpose of a NumPy array
Another very interesting reshaping method of NumPy is the transpose() method. It takes the input
array and swaps the rows with the column values, and the column values with the values of the rows:
Syntax : numpy.transpose()
23
Private and Confidential www.futureconnect.net 23
Expanding and Squeezing a NumPy array
Expanding a NumPy array
• You can add a new axis to an array using the expand_dims() method by providing the array and the
axis along which to expand
Squeezing a NumPy array
• On the other hand, if you instead want to reduce the axis of the array, use the squeeze() method.
• It removes the axis that has a single entry. This means if you have created a 2 x 2 x 1 matrix,
squeeze() will remove the third dimension from the matrix
24
Private and Confidential www.futureconnect.net 24
Numpy- Arrays Join and Split
• Joining means to merge two or more arrays.
• We use concatenate() function to join arrays.
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)
• Splitting means to breaking one array into many.
arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 3)
print(newarr)
Output: [1 2 3 4 5 6]
Output: [array([1,2]),array([3,4]),array([5,6])]
Private and Confidential www.futureconnect.net 25
Pandas
• Data Analysis Tool
• Used for exploring, manipulating, analyzing data.
• The source code for Pandas is found at this github repository
https://github.com/pandas-dev/pandas
• Pandas convert messy data into readable and required format for analysis.
Private and Confidential www.futureconnect.net 26
Pandas Installation and Importing
Pre-requirements: Python and Python Package Installer(pip)
Installation: pip install pandas
Import: After installation, import the package by the “import” keyword.
import pandas
This ensures that Pandas package is properly installed and ready to use
Package
Private and Confidential www.futureconnect.net 27
Pandas -Series and Dataframes
• Series is a 1D array containing one type of data
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
• Dataframe is a 2D array containing rows and columns
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data)
print(df)
Output: 0 1
1 7
2 2
dtype: int64
Loading data into dataframe Output:
calories duration
0 420 50
1 380 40
2 390 45
Private and Confidential www.futureconnect.net 28
Pandas: Read CSV
• It is used to read CSV(Comma Separated File).
• pd.read_csv() function is used.
import pandas as pd
df = pd.read_csv('data.csv’)
When we print df, we get first 5 rows and last 5 columns in the data as default
df.head(10) : Print first 10 rows
df.tail(10): Print last 10 rows.
df.info(): Information about the data
Input File:data.csv
File is read and stored as data frame in df variable
Private and Confidential www.futureconnect.net 29
Python Matplotlib
• Graph Plotting Library
• Created by John D. Hunter
• The source code for Matplotlib is located at this github repository
https://github.com/matplotlib/matplotlib
• It makes use of NumPy, the numerical mathematics extension of Python
• The current stable version is 2.2.0 released in January 2018.
Private and Confidential www.futureconnect.net 30
Matplotlib Installation and Importing
Pre-requirements: Python and Python Package Installer(pip)
Installation: pip install matplotlib
Import: After installation, import the package by the “import” keyword.
import matplotlib
This ensures that Matplotlib package is properly installed and ready to use
Package
Private and Confidential www.futureconnect.net 31
Matplotlib Pyplot
• Matplotlib utilities comes under the Pyplot submodule as plt shown below:
import matplotlib.pyplot as plt
Now, Pyplot can be referred as plt
• plot() function is used to draw lines from points
• show() function is used to display the graph
import matplotlib.pyplot as plt
import numpy as np
xpoints = np.array([0, 6])
ypoints = np.array([0, 250])
plt.plot(xpoints, ypoints)
plt.show()
Private and Confidential www.futureconnect.net 32
Matplotlib Functions
• xlabel() and ylabel() functions are used to add labels
• subplots() functions to draw multiple plots in one figure
• scatter() function is used to construct scatter plots
• bar() function to draw bar graphs
Scatter Plot
Bar Plot
Private and Confidential www.futureconnect.net 33
DATA EXPLORATION: load data file(s)
Private and Confidential www.futureconnect.net 34
DATA EXPLORATION:load data file(s)
Private and Confidential www.futureconnect.net 35
DATA EXPLORATION:load data file(s)
Private and Confidential www.futureconnect.net 36
DATA EXPLORATION:convert a variable to a
different data type
Private and Confidential www.futureconnect.net 37
DATA EXPLORATION:Transpose a Data set or
dataframe
Private and Confidential www.futureconnect.net 38
DATA EXPLORATION:Sort a Pandas DataFrame
Private and Confidential www.futureconnect.net 39
DATA EXPLORATION: Histogram Plot
Private and Confidential www.futureconnect.net 40
DATA EXPLORATION: Histogram Plot
Private and Confidential www.futureconnect.net 41
DATA EXPLORATION:Scatter Plot
Private and Confidential www.futureconnect.net 42
DATA EXPLORATION:Box Plot
Private and Confidential www.futureconnect.net 43
DATA EXPLORATION:Generate frequency
tables
Private and Confidential www.futureconnect.net 44
DATA EXPLORATION:Sample Dataset
Private and Confidential www.futureconnect.net 45
DATA EXPLORATION:Remove duplicate
values
Private and Confidential www.futureconnect.net 46
DATA EXPLORATION:Group variables
Private and Confidential www.futureconnect.net 47
DATA EXPLORATION:Treat missing values
TREATMENT:
Private and Confidential www.futureconnect.net 48

More Related Content

PDF
Large Data Analyze With PyTables
Innfinision Cloud and BigData Solutions
 
PDF
PyTables
Ali Hallaji
 
PDF
Scaling PyData Up and Out
Travis Oliphant
 
PDF
Data visualization
Moushmi Dasgupta
 
PPTX
Python for data science
Tanzeel Ahmad Mujahid
 
PPTX
Python ml
Shubham Sharma
 
PPTX
Intro to Python Data Analysis in Wakari
Karissa Rae McKelvey
 
PDF
Python for Data Science
Harri Hämäläinen
 
Large Data Analyze With PyTables
Innfinision Cloud and BigData Solutions
 
PyTables
Ali Hallaji
 
Scaling PyData Up and Out
Travis Oliphant
 
Data visualization
Moushmi Dasgupta
 
Python for data science
Tanzeel Ahmad Mujahid
 
Python ml
Shubham Sharma
 
Intro to Python Data Analysis in Wakari
Karissa Rae McKelvey
 
Python for Data Science
Harri Hämäläinen
 

What's hot (19)

PPTX
Scaling Python to CPUs and GPUs
Travis Oliphant
 
PDF
Data Wrangling and Visualization Using Python
MOHITKUMAR1379
 
PDF
Standardizing arrays -- Microsoft Presentation
Travis Oliphant
 
PDF
Data Structures for Statistical Computing in Python
Wes McKinney
 
PDF
PyCon Estonia 2019
Travis Oliphant
 
PDF
PyData Barcelona Keynote
Travis Oliphant
 
PDF
Data visualization in Python
Marc Garcia
 
PDF
Big data analysis in python @ PyCon.tw 2013
Jimmy Lai
 
PDF
Array computing and the evolution of SciPy, NumPy, and PyData
Travis Oliphant
 
PDF
Python for Computer Vision - Revision 2nd Edition
Ahmed Gad
 
PDF
Data Analytics Webinar for Aspirants
Prakash Pimpale
 
PDF
Keynote at Converge 2019
Travis Oliphant
 
PDF
Scipy 2011 Time Series Analysis in Python
Wes McKinney
 
PDF
SciPy Latin America 2019
Travis Oliphant
 
PPTX
Data science in ruby is it possible? is it fast? should we use it?
Rodrigo Urubatan
 
KEY
Numba lightning
Travis Oliphant
 
PPTX
Analyzing Data With Python
Sarah Guido
 
PPTX
Python for Data Science with Anaconda
Travis Oliphant
 
PDF
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
DataStax Academy
 
Scaling Python to CPUs and GPUs
Travis Oliphant
 
Data Wrangling and Visualization Using Python
MOHITKUMAR1379
 
Standardizing arrays -- Microsoft Presentation
Travis Oliphant
 
Data Structures for Statistical Computing in Python
Wes McKinney
 
PyCon Estonia 2019
Travis Oliphant
 
PyData Barcelona Keynote
Travis Oliphant
 
Data visualization in Python
Marc Garcia
 
Big data analysis in python @ PyCon.tw 2013
Jimmy Lai
 
Array computing and the evolution of SciPy, NumPy, and PyData
Travis Oliphant
 
Python for Computer Vision - Revision 2nd Edition
Ahmed Gad
 
Data Analytics Webinar for Aspirants
Prakash Pimpale
 
Keynote at Converge 2019
Travis Oliphant
 
Scipy 2011 Time Series Analysis in Python
Wes McKinney
 
SciPy Latin America 2019
Travis Oliphant
 
Data science in ruby is it possible? is it fast? should we use it?
Rodrigo Urubatan
 
Numba lightning
Travis Oliphant
 
Analyzing Data With Python
Sarah Guido
 
Python for Data Science with Anaconda
Travis Oliphant
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
DataStax Academy
 
Ad

Similar to Session 2 (20)

PDF
Study of Various Tools for Data Science
IRJET Journal
 
PPTX
L 5 Numpy final learning and Coding
Kirti Verma
 
PDF
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Alexey Zinoviev
 
PDF
Scientific Python
Eueung Mulyana
 
PPTX
Python-Libraries,Numpy,Pandas,Matplotlib.pptx
anushya2915
 
PDF
Python for Data Science: A Comprehensive Guide
priyanka rajput
 
PPTX
Python for ML.pptx
Dr. Amanpreet Kaur
 
PPTX
DATA ANALYSIS AND VISUALISATION using python
ChiragNahata2
 
PPTX
Adarsh_Masekar(2GP19CS003).pptx
hkabir55
 
PPTX
data science for engineering reference pdf
fatehiaryaa
 
PPTX
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
kalai75
 
PPTX
Lecture 2 _Foundions foundions NumPyI.pptx
disserdekabrcha
 
PDF
Python Libraries for Data Science - A Must-Know List.pdf
TCCI Computer Coaching
 
PPTX
Data Analyzing And Visualization Using Python.pptx
PoojaChavan51
 
PPTX
lec08-numpy.pptx
lekha572836
 
PDF
PyData Boston 2013
Travis Oliphant
 
PDF
Migrating from matlab to python
ActiveState
 
PPTX
ANN-Lecture2-Python Startup.pptx
ShahzadAhmadJoiya3
 
PPTX
Chapter 5-Numpy-Pandas.pptx python programming
ssuser77162c
 
PDF
Python pandas I .pdf gugugigg88iggigigih
rajveerpersonal21
 
Study of Various Tools for Data Science
IRJET Journal
 
L 5 Numpy final learning and Coding
Kirti Verma
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Alexey Zinoviev
 
Scientific Python
Eueung Mulyana
 
Python-Libraries,Numpy,Pandas,Matplotlib.pptx
anushya2915
 
Python for Data Science: A Comprehensive Guide
priyanka rajput
 
Python for ML.pptx
Dr. Amanpreet Kaur
 
DATA ANALYSIS AND VISUALISATION using python
ChiragNahata2
 
Adarsh_Masekar(2GP19CS003).pptx
hkabir55
 
data science for engineering reference pdf
fatehiaryaa
 
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
kalai75
 
Lecture 2 _Foundions foundions NumPyI.pptx
disserdekabrcha
 
Python Libraries for Data Science - A Must-Know List.pdf
TCCI Computer Coaching
 
Data Analyzing And Visualization Using Python.pptx
PoojaChavan51
 
lec08-numpy.pptx
lekha572836
 
PyData Boston 2013
Travis Oliphant
 
Migrating from matlab to python
ActiveState
 
ANN-Lecture2-Python Startup.pptx
ShahzadAhmadJoiya3
 
Chapter 5-Numpy-Pandas.pptx python programming
ssuser77162c
 
Python pandas I .pdf gugugigg88iggigigih
rajveerpersonal21
 
Ad

Recently uploaded (20)

PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
PPTX
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
PPTX
Basics and rules of probability with real-life uses
ravatkaran694
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
PDF
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
PPTX
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
PPTX
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
PPTX
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
PPTX
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
PPTX
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
PPTX
PROTIEN ENERGY MALNUTRITION: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
PPTX
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
PPTX
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
PDF
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PPTX
How to Apply for a Job From Odoo 18 Website
Celine George
 
DOCX
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
PPTX
Virus sequence retrieval from NCBI database
yamunaK13
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
Basics and rules of probability with real-life uses
ravatkaran694
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
PROTIEN ENERGY MALNUTRITION: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
How to Apply for a Job From Odoo 18 Website
Celine George
 
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
Virus sequence retrieval from NCBI database
yamunaK13
 

Session 2

  • 1. Presenter: Date: TOPIC: AI and DS Private and Confidential www.futureconnect.net 1
  • 2. Private and Confidential www.futureconnect.net 2 AGENDA UNIT NAME TOPICS Hours Count Session 1.DATA SCIENCE 1. DATA SCIENCE LIBARIES 2. NUMPY 3. PANDAS 4. MATPLOTLIB 5. DATA EXPLORATION 2 2
  • 3. OBJECTIVES • Gain knowledge of Data Science Libraries • To understand Data Science Manipulation Packages • Demo for Data Exploration using Package 3 Private and Confidential www.futureconnect.net 3
  • 4. Data Mining Scrapy • One of the most popular Python data science libraries, Scrapy helps to build crawling programs (spider bots) that can retrieve structured data from the web – for example, URLs or contact info. • It's a great tool for scraping data used in, for example, Python machine learning models. • Developers use it for gathering data from APIs. BeautifulSoup • BeautifulSoup is another really popular library for web crawling and data scraping. • If you want to collect data that’s available on some website but not via a proper CSV or API, BeautifulSoup can help you scrape it and arrange it into the format you need. 4 Private and Confidential www.futureconnect.net 4
  • 5. Data Processing and Modeling NumPy • NumPy (Numerical Python) is a perfect tool for scientific computing and performing basic and advanced array operations. • The library offers many handy features performing operations on n-arrays and matrices in Python. SciPy • This useful library includes modules for linear algebra, integration, optimization, and statistics. • Its main functionality was built upon NumPy, so its arrays make use of this library. • SciPy works great for all kinds of scientific programming projects (science, mathematics, and engineering 5 Private and Confidential www.futureconnect.net 5
  • 6. Data Processing and Modeling Pandas • Pandas is a library created to help developers work with "labeled" and "relational" data intuitively. • It's based on two main data structures: "Series" (one-dimensional, like a list of items) and "Data Frames" (two-dimensional, like a table with multiple columns). Keras • Keras is a great library for building neural networks and modeling. • It's very straightforward to use and provides developers with a good degree of extensibility. The library takes advantage of other packages, (Theano or TensorFlow) as its backends. 6 Private and Confidential www.futureconnect.net 6
  • 7. Data Processing and Modeling SciKit-Learn • This is an industry-standard for data science projects based in Python. • Scikits is a group of packages in the SciPy Stack that were created for specific functionalities – for example, image processing. Scikit-learn uses the math operations of SciPy to expose a concise interface to the most common machine learning algorithms. PyTorch • PyTorch is a framework that is perfect for data scientists who want to perform deep learning tasks easily. • The tool allows performing tensor computations with GPU acceleration. It's also used for other tasks – for example, for creating dynamic computational graphs and calculating gradients automatically. 7 Private and Confidential www.futureconnect.net 7
  • 8. Data Processing and Modeling TensorFlow • TensorFlow is a popular Python framework for machine learning and deep learning, which was developed at Google Brain. • It's the best tool for tasks like object identification, speech recognition, and many others. • It helps in working with artificial neural networks that need to handle multiple data sets. XGBoost • This library is used to implement machine learning algorithms under the Gradient Boosting framework. • XGBoost is portable, flexible, and efficient. • It offers parallel tree boosting that helps teams to resolve many data science problems. Another advantage is that developers can run the same code on major distributed environments such as Hadoop, SGE, and MPI. 8 Private and Confidential www.futureconnect.net 8
  • 9. Data Visualization Matplotlib • This is a standard data science library that helps to generate data visualizations such as two- dimensional diagrams and graphs (histograms, scatterplots, non-Cartesian coordinates graphs). • Matplotlib is one of those plotting libraries that are really useful in data science projects — it provides an object-oriented API for embedding plots into applications. • Developers need to write more code than usual while using this library for generating advanced visualizations. Seaborn • Seaborn is based on Matplotlib and serves as a useful Python machine learning tool for visualizing statistical models – heatmaps and other types of visualizations that summarize data and depict the overall distributions. • When using this library, you get to benefit from an extensive gallery of visualizations (including complex ones like time series, joint plots, and violin diagrams). 9 Private and Confidential www.futureconnect.net 9
  • 10. Data Visualization Bokeh • This library is a great tool for creating interactive and scalable visualizations inside browsers using JavaScript widgets. Bokeh is fully independent of Matplotlib. • It focuses on interactivity and presents visualizations through modern browsers – similarly to Data- Driven Documents (d3.js). It offers a set of graphs, interaction abilities (like linking plots or adding JavaScript widgets), and styling. Plotly • This web-based tool for data visualization that offers many useful out-of-box graphics – you can find them on the Plot.ly website. • The library works very well in interactive web applications. pydot • This library helps to generate oriented and non-oriented graphs. • It serves as an interface to Graphviz (written in pure Python). The graphs created come in handy when you're developing algorithms based on neural networks and decision trees. 10 Private and Confidential www.futureconnect.net 10
  • 11. Python Libraries for Data Science • Pandas: Used for structured data operations • NumPy: Creating Arrays • Matplotlib: Data Visualization • Scikit-learn: Machine Learning Operations • SciPy: Perform Scientific operations • TensorFlow: Symbolic math library • BeautifulSoup: Parsing HTML and XML pages Private and Confidential www.futureconnect.net 11 This 3 Python Libraries will be covered in the following slides
  • 12. Numpy • NumPy=Numerical Python • Created in 2005 by Travis Oliphant. • Consist of Array objects and perform array processing. • NumPy is faster than traditional Python lists as it is stored in one continuous place in memory. • The array object in NumPy is called ndarray. Private and Confidential www.futureconnect.net 12
  • 13. Top four benefits that NumPy can bring to your code: 1. More speed: NumPy uses algorithms written in C that complete in nanoseconds rather than seconds. 2. Fewer loops: NumPy helps you to reduce loops and keep from getting tangled up in iteration indices. 3. Clearer code: Without loops, your code will look more like the equations you’re trying to calculate. 4. Better quality: There are thousands of contributors working to keep NumPy fast, friendly, and bug free. 13 Private and Confidential www.futureconnect.net 13
  • 14. Numpy Installation and Importing Pre-requirements: Python and Python Package Installer(pip) Installation: pip install numpy Import: After installation, import the package by the “import” keyword. import numpy This ensures that NumPy package is properly installed and ready to use Package Private and Confidential www.futureconnect.net 14
  • 15. Numpy-ndarray Object • It defines the collection of items which belong to same type. • Each element in ndarray is an object of data-type object : dtype • Basic ndarray creation: numpy.array OR numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0) Array interface Data type Object copying Row/Col major Base class array Number of or 1D dimensions Private and Confidential www.futureconnect.net 15
  • 16. Sample Input-Output Code: import numpy as np a=np.array([1,2,3]) b=np.array([[1,2],[3,4]]) print(a) print(b) Output: [1,2,3] [[1,2] [3,4]] Private and Confidential www.futureconnect.net 16 1D Array 2D Array
  • 17. NumPy arrays can be multi-dimensional too. np.array([[1,2,3,4],[5,6,7,8]]) array([[1, 2, 3, 4], [5, 6, 7, 8]]) • Here, we created a 2-dimensional array of values. • Note: A matrix is just a rectangular array of numbers with shape N x M where N is the number of rows and M is the number of columns in the matrix. The one you just saw above is a 2 x 4 matrix. 17 Private and Confidential www.futureconnect.net 17
  • 18. Types of NumPy arrays • Array of zeros • Array of ones • Random numbers in ndarrays • Imatrix in NumPy • Evenly spaced ndarray 18 Private and Confidential www.futureconnect.net 18
  • 19. Numpy - Array Indexing and Slicing • It is used to access array elements by using index element. • The indexes in NumPy arrays start with 0. arr = np.array([1, 2, 3, 4]) arr[0] Accessing first element of the array. Hence, the value is 1. arr = np.array([[1,2,3,4,5], [6,7,8,9,10]]) arr[0,1] Accessing the second element of the 2D array. Hence, the value is 2. Slicing: Taking elements of an array from start index to end index [start:end] or [start:step:end] arr = np.array([1, 2, 3, 4, 5, 6, 7]) print(arr[1:5]) Ans: [2 3 4 5] Private and Confidential www.futureconnect.net 19
  • 20. Dimensions of NumPy arrays You can easily determine the number of dimensions or axes of a NumPy array using the ndims attribute: # number of axis a = np.array([[5,10,15],[20,25,20]]) print('Array :','n',a) print('Dimensions :','n',a.ndim) Array : [[ 5 10 15] [20 25 20]] Dimensions : 2 This array has two dimensions: 2 rows and 3 columns. 20 Private and Confidential www.futureconnect.net 20
  • 21. Numpy- Array Shape and Reshape • The shape of an array is the number of data elements in the array. • It has an attribute called shape to perform the action arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]) print(arr.shape) • Reshaping is done to change the shape of an array. arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) newarr = arr.reshape(4, 3) print(newarr) Output: (2,4) Output: [[1 2 3] [4 5 6] [7 8 9] [10 11 12]] Private and Confidential www.futureconnect.net 21
  • 22. Flattening a NumPy array Sometimes when you have a multidimensional array and want to collapse it to a single-dimensional array, you can either use the flatten() method or the ravel() method: Syntax: • flatten() • ravel() 22 Private and Confidential www.futureconnect.net 22
  • 23. Transpose of a NumPy array Another very interesting reshaping method of NumPy is the transpose() method. It takes the input array and swaps the rows with the column values, and the column values with the values of the rows: Syntax : numpy.transpose() 23 Private and Confidential www.futureconnect.net 23
  • 24. Expanding and Squeezing a NumPy array Expanding a NumPy array • You can add a new axis to an array using the expand_dims() method by providing the array and the axis along which to expand Squeezing a NumPy array • On the other hand, if you instead want to reduce the axis of the array, use the squeeze() method. • It removes the axis that has a single entry. This means if you have created a 2 x 2 x 1 matrix, squeeze() will remove the third dimension from the matrix 24 Private and Confidential www.futureconnect.net 24
  • 25. Numpy- Arrays Join and Split • Joining means to merge two or more arrays. • We use concatenate() function to join arrays. arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6]) arr = np.concatenate((arr1, arr2)) print(arr) • Splitting means to breaking one array into many. arr = np.array([1, 2, 3, 4, 5, 6]) newarr = np.array_split(arr, 3) print(newarr) Output: [1 2 3 4 5 6] Output: [array([1,2]),array([3,4]),array([5,6])] Private and Confidential www.futureconnect.net 25
  • 26. Pandas • Data Analysis Tool • Used for exploring, manipulating, analyzing data. • The source code for Pandas is found at this github repository https://github.com/pandas-dev/pandas • Pandas convert messy data into readable and required format for analysis. Private and Confidential www.futureconnect.net 26
  • 27. Pandas Installation and Importing Pre-requirements: Python and Python Package Installer(pip) Installation: pip install pandas Import: After installation, import the package by the “import” keyword. import pandas This ensures that Pandas package is properly installed and ready to use Package Private and Confidential www.futureconnect.net 27
  • 28. Pandas -Series and Dataframes • Series is a 1D array containing one type of data import pandas as pd a = [1, 7, 2] myvar = pd.Series(a) print(myvar) • Dataframe is a 2D array containing rows and columns import pandas as pd data = { "calories": [420, 380, 390], "duration": [50, 40, 45] } df = pd.DataFrame(data) print(df) Output: 0 1 1 7 2 2 dtype: int64 Loading data into dataframe Output: calories duration 0 420 50 1 380 40 2 390 45 Private and Confidential www.futureconnect.net 28
  • 29. Pandas: Read CSV • It is used to read CSV(Comma Separated File). • pd.read_csv() function is used. import pandas as pd df = pd.read_csv('data.csv’) When we print df, we get first 5 rows and last 5 columns in the data as default df.head(10) : Print first 10 rows df.tail(10): Print last 10 rows. df.info(): Information about the data Input File:data.csv File is read and stored as data frame in df variable Private and Confidential www.futureconnect.net 29
  • 30. Python Matplotlib • Graph Plotting Library • Created by John D. Hunter • The source code for Matplotlib is located at this github repository https://github.com/matplotlib/matplotlib • It makes use of NumPy, the numerical mathematics extension of Python • The current stable version is 2.2.0 released in January 2018. Private and Confidential www.futureconnect.net 30
  • 31. Matplotlib Installation and Importing Pre-requirements: Python and Python Package Installer(pip) Installation: pip install matplotlib Import: After installation, import the package by the “import” keyword. import matplotlib This ensures that Matplotlib package is properly installed and ready to use Package Private and Confidential www.futureconnect.net 31
  • 32. Matplotlib Pyplot • Matplotlib utilities comes under the Pyplot submodule as plt shown below: import matplotlib.pyplot as plt Now, Pyplot can be referred as plt • plot() function is used to draw lines from points • show() function is used to display the graph import matplotlib.pyplot as plt import numpy as np xpoints = np.array([0, 6]) ypoints = np.array([0, 250]) plt.plot(xpoints, ypoints) plt.show() Private and Confidential www.futureconnect.net 32
  • 33. Matplotlib Functions • xlabel() and ylabel() functions are used to add labels • subplots() functions to draw multiple plots in one figure • scatter() function is used to construct scatter plots • bar() function to draw bar graphs Scatter Plot Bar Plot Private and Confidential www.futureconnect.net 33
  • 34. DATA EXPLORATION: load data file(s) Private and Confidential www.futureconnect.net 34
  • 35. DATA EXPLORATION:load data file(s) Private and Confidential www.futureconnect.net 35
  • 36. DATA EXPLORATION:load data file(s) Private and Confidential www.futureconnect.net 36
  • 37. DATA EXPLORATION:convert a variable to a different data type Private and Confidential www.futureconnect.net 37
  • 38. DATA EXPLORATION:Transpose a Data set or dataframe Private and Confidential www.futureconnect.net 38
  • 39. DATA EXPLORATION:Sort a Pandas DataFrame Private and Confidential www.futureconnect.net 39
  • 40. DATA EXPLORATION: Histogram Plot Private and Confidential www.futureconnect.net 40
  • 41. DATA EXPLORATION: Histogram Plot Private and Confidential www.futureconnect.net 41
  • 42. DATA EXPLORATION:Scatter Plot Private and Confidential www.futureconnect.net 42
  • 43. DATA EXPLORATION:Box Plot Private and Confidential www.futureconnect.net 43
  • 44. DATA EXPLORATION:Generate frequency tables Private and Confidential www.futureconnect.net 44
  • 45. DATA EXPLORATION:Sample Dataset Private and Confidential www.futureconnect.net 45
  • 46. DATA EXPLORATION:Remove duplicate values Private and Confidential www.futureconnect.net 46
  • 47. DATA EXPLORATION:Group variables Private and Confidential www.futureconnect.net 47
  • 48. DATA EXPLORATION:Treat missing values TREATMENT: Private and Confidential www.futureconnect.net 48