SlideShare a Scribd company logo
Key projects in
Data Science
March 30
2024
This is a summary of my five years hands-on towards achieving the required experience, and skills in Data Science
and Engineering. This includes key partner trainings from Google® Vijayananda Mohire
Key projects in Data Science
Project
No.
Project Name Project Summary / Learning
1 Olympic Medal
Analysis
This project uses the Olympics.csv dataset from Kaggle. It provides various details and insights of the medal won by
various players and countries related data. Below is one example.
Key projects Data Science and Engineering
Key projects Data Science and Engineering
2 Migrating from
Spark to BigQuery
via Dataproc
Migrating the original Spark code, to Dataproc (lift-and-shift) and analysis of the spark tasks. Copy data to HDFS,
reading the CSV files, Spark Analysis using dataframe and Spark SQL.
3 Flight Departure
delay analysis
This project provides insights of departure delays using Google’s Big Query, sql query and Dataframe plots to visualize
the dataset analysis. Dataset used was Google’s internal storage with dataset name: cloud-training-
demos.airline_ontime_data.flights
4 Exploratory data
analysis using
BigQuery
EDA using linear regression using Python and Scikit-Learn, heatmaps for predicting US house value US and taxi fare
estimation
5 Exploring and
Creating an
Ecommerce
Analytics Pipeline
with Cloud
Dataprep v1.5 /
Data wrangling /
cleansing
Cloud Dataprep® by Trifacta® is an intelligent data service for visually exploring, cleaning, and preparing structured
and unstructured data for analysis. In this lab we will explore the Cloud Dataprep UI to build an ecommerce
transformation pipeline that will run at a scheduled interval and output results back into BigQuery.
The dataset we will be using is an ecommerce dataset that has millions of Google® Analytics records for the Google®
Merchandise Store loaded into BigQuery
In this lab, you learn how to perform these tasks:
 Connect BigQuery datasets to Cloud Dataprep
 Explore dataset quality with Cloud Dataprep
 Create a data transformation pipeline with Cloud Dataprep
 Schedule transformation jobs outputs to BigQuery
Analytics Pipeline
Recipe Book with rules
Results with duplicate rows removed
6 Advanced
Visualizations with
TensorFlow Data
Validation
This lab illustrates how TensorFlow Data Validation (TFDV) can be used to investigate and visualize your dataset. That
includes looking at descriptive statistics, inferring a schema, checking for and fixing anomalies, and checking for drift
and skew in our dataset
First we'll use `tfdv.generate_statistics_from_csv` to compute statistics for our training data.
TFDV can compute descriptive statistics that provide a quick overview of the data in terms of the features that are
present and the shapes of their value distributions. Now let's use [`tfdv.infer_schema`] to create a schema for our
data.
Does our evaluation dataset match the schema from our training dataset? This is especially important for categorical
features, where we want to identify the range of acceptable values.
Drift detection is supported for categorical features and between consecutive spans of data (i.e., between span N and
span N+1), such as between different days of training data. We express drift in terms of [L-infinity distance], and
you can set the threshold distance so that you receive warnings when the drift is higher than is acceptable.
Adding skew and drift comparators to visualize and make corrections. Few of the uses are:
1. Validating new data for inference to make sure that we haven't suddenly started receiving bad features
2. Validating new data for inference to make sure that our model has trained on that part of the decision surface
3. Validating our data after we've transformed it and done feature engineering (probably using [TensorFlow
Transform] to make sure we haven't done something wrong
7 TPU Speed Data
Pipelines
TPUs are very fast, and the stream of training data must keep up with their training speed. In this lab, you will learn
how to load data from Cloud Storage with the tf.data.Dataset API to feed your TPU
You will learn:
• To use the tf.data.Dataset API to load training data.
• To use TFRecord format to load training data efficiently from Cloud Storage.
8 Data Pipelines –
design and deploy
Kubeflow pipeline in Google. This project offers three components that provide different outputs that can be combined
to provide a final response to the consumer
https://github.com/vijaymohire/gcp/blob/main/MyPipeExample.ipynb
https://github.com/vijaymohire/gcp/blob/main/KubeflowpipelineRun.png
9 Data ingestion, ETL Project name: Serverless Data Processing with Dataflow - Writing an ETL Pipeline using Apache Beam and Cloud
Dataflow (Python)
In this lab, you will learn how to:
 Build a batch Extract-Transform-Load pipeline in Apache Beam, which takes raw data from Google Cloud
Storage and writes it to Google BigQuery.
 Run the Apache Beam pipeline on Cloud Dataflow.
 Parameterize the execution of the pipeline.
10 Feature
Engineering
Project: Predict Bike Trip Duration with a Regression Model in BQML 2.5
In this lab, you learn to perform the following tasks:
 Query and explore the London bicycles dataset for feature engineering
 Create a linear regression model in BigQuery ML
 Evaluate the performance of your machine learning model
 Extract your model weights
Impact of number of bicycles
A potential feature is the number of bikes in the station. Perhaps, we hypothesize, people keep bicycles longer if
there are fewer bicycles on rent at the station they rented from.
1. In the query editor window paste the following query:
SELECT bikes_count, AVG(duration) AS duration FROM `bigquery-public-data`.london_bicycles.cycle_hire JOIN
`bigquery-public-data`.london_bicycles.cycle_stations ON cycle_hire.start_station_name = cycle_stations.name
GROUP BY bikes_count
2. Visualize your data in Looker Studio.
11 Data quality Project Name: Improving Data Quality
This notebook introduced a few concepts to improve data quality. We resolved missing values, converted the Date
feature column to a datetime format, renamed feature columns, removed a value from a feature column, created one-
hot encoding features, and converted temporal features to meaningful representations. By the end of our lab, we
gained an understanding as to why data should be "cleaned" and "pre-processed" before input into a machine
learning model.
1. **Data Quality Issue #1**:
> **Missing Values**:
Each feature column has multiple missing values. In fact, we have a total of 18 missing values.
2. **Data Quality Issue #2**:
> **Date DataType**: Date is shown as an "object" datatype and should be a datetime. In addition, Date is in one
column. Our business requirement is to see the Date parsed out to year, month, and day.
3. **Data Quality Issue #3**:
> **Model Year**: We are only interested in years greater than 2006, not "<2006".
4. **Data Quality Issue #4**:
> **Categorical Columns**: The feature column "Light_Duty" is categorical and has a "Yes/No" choice. We cannot
feed values like this into a machine learning model. In addition, we need to "one-hot encode the remaining
"string"/"object" columns.
5. **Data Quality Issue #5**:
> **Temporal Features**: How do we handle year, month, and day?
https://github.com/vijaymohire/datascience/blob/main/dataengg/improve_data_quality-Lab%206.ipynb
12 Terraform
Deployment
Use Terraform to deploy Google resources to servers in US and EU. Create required VPC network, security groups and
deploy resources like VM instances, storage etc
https://github.com/vijaymohire/gcp/blob/main/Terraform%20for%20GCP%20resources%20deployment%20Demo.pdf
Disclaimer:
We have sourced the content from various courses and partner trainings. All details, references are for educational purposes only
• Google®
is trademark of the Google®
Corporation. All logos, trademarks and brand names belong to the respective owners as specified. We
have no intention to infringe any copyrights or alter related permissions set by the owners. Please refer to source websites for any further
details. This is for educational and information purpose only.
For more details, contact:
Bhadale IT Pvt. Ltd; Email: vijaymohire@gmail.com

More Related Content

Similar to Key projects Data Science and Engineering (20)

PDF
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
PDF
BigQuery ML - Machine learning at scale using SQL
Márton Kodok
 
PDF
Google machine learning engineer exam dumps 2022
SkillCertProExams
 
PDF
Comparing the performance of a business process: using Excel & Python
IRJET Journal
 
PDF
BigdataConference Europe - BigQuery ML
Márton Kodok
 
PDF
Google.Test4prep.Professional-Data-Engineer.v2038q.pdf
ssuser22b701
 
PDF
google cloud profrcrna; Preparing_for_PDE_Workbook-1-.pdf
beyid52508
 
PDF
Big query
Tanvi Parikh
 
PDF
Clearing Airflow Obstructions
Tatiana Al-Chueyr
 
PDF
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
PDF
Democratization of NOSQL Document-Database over Relational Database Comparati...
IRJET Journal
 
PDF
Shipment Time Prediction for Maritime Industry using Machine Learning
IRJET Journal
 
PDF
Roadmap for Enterprise Graph Strategy
Neo4j
 
PDF
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
PDF
Data Driven Attribution in BigQuery with Shapley Values and Markov Chains
Christopher Gutknecht
 
PPTX
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
PPTX
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
PDF
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Christopher Gutknecht
 
PDF
Big Data, Bigger Analytics
Itzhak Kameli
 
PPTX
Data Science Challenge presentation given to the CinBITools Meetup Group
Doug Needham
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
BigQuery ML - Machine learning at scale using SQL
Márton Kodok
 
Google machine learning engineer exam dumps 2022
SkillCertProExams
 
Comparing the performance of a business process: using Excel & Python
IRJET Journal
 
BigdataConference Europe - BigQuery ML
Márton Kodok
 
Google.Test4prep.Professional-Data-Engineer.v2038q.pdf
ssuser22b701
 
google cloud profrcrna; Preparing_for_PDE_Workbook-1-.pdf
beyid52508
 
Big query
Tanvi Parikh
 
Clearing Airflow Obstructions
Tatiana Al-Chueyr
 
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
Democratization of NOSQL Document-Database over Relational Database Comparati...
IRJET Journal
 
Shipment Time Prediction for Maritime Industry using Machine Learning
IRJET Journal
 
Roadmap for Enterprise Graph Strategy
Neo4j
 
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
Data Driven Attribution in BigQuery with Shapley Values and Markov Chains
Christopher Gutknecht
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Christopher Gutknecht
 
Big Data, Bigger Analytics
Itzhak Kameli
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Doug Needham
 

More from Vijayananda Mohire (20)

PDF
Bhadale QAI Hub - for multicloud, multitechnology platform
Vijayananda Mohire
 
PDF
Practical_Introduction_to_Quantum_Safe_Cryptography
Vijayananda Mohire
 
PDF
Progress Report- MIT Course 8.371.3x - VD-Mohire
Vijayananda Mohire
 
PDF
Quantum Communications Q&A with Gemini LLM
Vijayananda Mohire
 
PDF
Peer Review Certificate for Journal of Engg
Vijayananda Mohire
 
PDF
Quantum Algorithms for Electronics - IEEE Certificate
Vijayananda Mohire
 
PDF
NexGen Solutions for cloud platforms, powered by GenQAI
Vijayananda Mohire
 
PDF
Certificate- Peer Review of Book Chapter on ML
Vijayananda Mohire
 
PDF
Key projects Data Science and Engineering
Vijayananda Mohire
 
PDF
Bhadale IT Hub-Multi Cloud and Multi QAI
Vijayananda Mohire
 
PDF
My key hands-on projects in Quantum, and QAI
Vijayananda Mohire
 
PDF
Azure Quantum Workspace for developing Q# based quantum circuits
Vijayananda Mohire
 
PDF
My Journey towards Artificial Intelligence
Vijayananda Mohire
 
PDF
Bhadale IT Cloud Solutions for Agriculture
Vijayananda Mohire
 
PDF
Bhadale IT Cloud Solutions for Agriculture
Vijayananda Mohire
 
PDF
Bhadale IT Intel and Azure Cloud Offerings
Vijayananda Mohire
 
PDF
GitHub Copilot-vijaymohire
Vijayananda Mohire
 
PDF
Practical ChatGPT From Use Cases to Prompt Engineering & Ethical Implications
Vijayananda Mohire
 
PDF
Cloud Infrastructure - Partner Delivery Accelerator (APAC)
Vijayananda Mohire
 
PDF
Red Hat Sales Specialist - Red Hat Enterprise Linux
Vijayananda Mohire
 
Bhadale QAI Hub - for multicloud, multitechnology platform
Vijayananda Mohire
 
Practical_Introduction_to_Quantum_Safe_Cryptography
Vijayananda Mohire
 
Progress Report- MIT Course 8.371.3x - VD-Mohire
Vijayananda Mohire
 
Quantum Communications Q&A with Gemini LLM
Vijayananda Mohire
 
Peer Review Certificate for Journal of Engg
Vijayananda Mohire
 
Quantum Algorithms for Electronics - IEEE Certificate
Vijayananda Mohire
 
NexGen Solutions for cloud platforms, powered by GenQAI
Vijayananda Mohire
 
Certificate- Peer Review of Book Chapter on ML
Vijayananda Mohire
 
Key projects Data Science and Engineering
Vijayananda Mohire
 
Bhadale IT Hub-Multi Cloud and Multi QAI
Vijayananda Mohire
 
My key hands-on projects in Quantum, and QAI
Vijayananda Mohire
 
Azure Quantum Workspace for developing Q# based quantum circuits
Vijayananda Mohire
 
My Journey towards Artificial Intelligence
Vijayananda Mohire
 
Bhadale IT Cloud Solutions for Agriculture
Vijayananda Mohire
 
Bhadale IT Cloud Solutions for Agriculture
Vijayananda Mohire
 
Bhadale IT Intel and Azure Cloud Offerings
Vijayananda Mohire
 
GitHub Copilot-vijaymohire
Vijayananda Mohire
 
Practical ChatGPT From Use Cases to Prompt Engineering & Ethical Implications
Vijayananda Mohire
 
Cloud Infrastructure - Partner Delivery Accelerator (APAC)
Vijayananda Mohire
 
Red Hat Sales Specialist - Red Hat Enterprise Linux
Vijayananda Mohire
 
Ad

Recently uploaded (20)

PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PDF
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Ad

Key projects Data Science and Engineering

  • 1. Key projects in Data Science March 30 2024 This is a summary of my five years hands-on towards achieving the required experience, and skills in Data Science and Engineering. This includes key partner trainings from Google® Vijayananda Mohire
  • 2. Key projects in Data Science Project No. Project Name Project Summary / Learning 1 Olympic Medal Analysis This project uses the Olympics.csv dataset from Kaggle. It provides various details and insights of the medal won by various players and countries related data. Below is one example.
  • 5. 2 Migrating from Spark to BigQuery via Dataproc Migrating the original Spark code, to Dataproc (lift-and-shift) and analysis of the spark tasks. Copy data to HDFS, reading the CSV files, Spark Analysis using dataframe and Spark SQL.
  • 6. 3 Flight Departure delay analysis This project provides insights of departure delays using Google’s Big Query, sql query and Dataframe plots to visualize the dataset analysis. Dataset used was Google’s internal storage with dataset name: cloud-training- demos.airline_ontime_data.flights
  • 7. 4 Exploratory data analysis using BigQuery EDA using linear regression using Python and Scikit-Learn, heatmaps for predicting US house value US and taxi fare estimation
  • 8. 5 Exploring and Creating an Ecommerce Analytics Pipeline with Cloud Dataprep v1.5 / Data wrangling / cleansing Cloud Dataprep® by Trifacta® is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis. In this lab we will explore the Cloud Dataprep UI to build an ecommerce transformation pipeline that will run at a scheduled interval and output results back into BigQuery. The dataset we will be using is an ecommerce dataset that has millions of Google® Analytics records for the Google® Merchandise Store loaded into BigQuery In this lab, you learn how to perform these tasks:  Connect BigQuery datasets to Cloud Dataprep  Explore dataset quality with Cloud Dataprep  Create a data transformation pipeline with Cloud Dataprep  Schedule transformation jobs outputs to BigQuery
  • 10. Results with duplicate rows removed 6 Advanced Visualizations with TensorFlow Data Validation This lab illustrates how TensorFlow Data Validation (TFDV) can be used to investigate and visualize your dataset. That includes looking at descriptive statistics, inferring a schema, checking for and fixing anomalies, and checking for drift and skew in our dataset First we'll use `tfdv.generate_statistics_from_csv` to compute statistics for our training data. TFDV can compute descriptive statistics that provide a quick overview of the data in terms of the features that are present and the shapes of their value distributions. Now let's use [`tfdv.infer_schema`] to create a schema for our data. Does our evaluation dataset match the schema from our training dataset? This is especially important for categorical features, where we want to identify the range of acceptable values. Drift detection is supported for categorical features and between consecutive spans of data (i.e., between span N and span N+1), such as between different days of training data. We express drift in terms of [L-infinity distance], and you can set the threshold distance so that you receive warnings when the drift is higher than is acceptable. Adding skew and drift comparators to visualize and make corrections. Few of the uses are: 1. Validating new data for inference to make sure that we haven't suddenly started receiving bad features 2. Validating new data for inference to make sure that our model has trained on that part of the decision surface
  • 11. 3. Validating our data after we've transformed it and done feature engineering (probably using [TensorFlow Transform] to make sure we haven't done something wrong 7 TPU Speed Data Pipelines TPUs are very fast, and the stream of training data must keep up with their training speed. In this lab, you will learn how to load data from Cloud Storage with the tf.data.Dataset API to feed your TPU You will learn: • To use the tf.data.Dataset API to load training data. • To use TFRecord format to load training data efficiently from Cloud Storage. 8 Data Pipelines – design and deploy Kubeflow pipeline in Google. This project offers three components that provide different outputs that can be combined to provide a final response to the consumer https://github.com/vijaymohire/gcp/blob/main/MyPipeExample.ipynb https://github.com/vijaymohire/gcp/blob/main/KubeflowpipelineRun.png 9 Data ingestion, ETL Project name: Serverless Data Processing with Dataflow - Writing an ETL Pipeline using Apache Beam and Cloud Dataflow (Python) In this lab, you will learn how to:
  • 12.  Build a batch Extract-Transform-Load pipeline in Apache Beam, which takes raw data from Google Cloud Storage and writes it to Google BigQuery.  Run the Apache Beam pipeline on Cloud Dataflow.  Parameterize the execution of the pipeline. 10 Feature Engineering Project: Predict Bike Trip Duration with a Regression Model in BQML 2.5 In this lab, you learn to perform the following tasks:  Query and explore the London bicycles dataset for feature engineering  Create a linear regression model in BigQuery ML  Evaluate the performance of your machine learning model  Extract your model weights
  • 13. Impact of number of bicycles A potential feature is the number of bikes in the station. Perhaps, we hypothesize, people keep bicycles longer if there are fewer bicycles on rent at the station they rented from. 1. In the query editor window paste the following query: SELECT bikes_count, AVG(duration) AS duration FROM `bigquery-public-data`.london_bicycles.cycle_hire JOIN `bigquery-public-data`.london_bicycles.cycle_stations ON cycle_hire.start_station_name = cycle_stations.name GROUP BY bikes_count 2. Visualize your data in Looker Studio.
  • 14. 11 Data quality Project Name: Improving Data Quality This notebook introduced a few concepts to improve data quality. We resolved missing values, converted the Date feature column to a datetime format, renamed feature columns, removed a value from a feature column, created one- hot encoding features, and converted temporal features to meaningful representations. By the end of our lab, we gained an understanding as to why data should be "cleaned" and "pre-processed" before input into a machine learning model. 1. **Data Quality Issue #1**: > **Missing Values**: Each feature column has multiple missing values. In fact, we have a total of 18 missing values. 2. **Data Quality Issue #2**: > **Date DataType**: Date is shown as an "object" datatype and should be a datetime. In addition, Date is in one column. Our business requirement is to see the Date parsed out to year, month, and day. 3. **Data Quality Issue #3**: > **Model Year**: We are only interested in years greater than 2006, not "<2006". 4. **Data Quality Issue #4**: > **Categorical Columns**: The feature column "Light_Duty" is categorical and has a "Yes/No" choice. We cannot feed values like this into a machine learning model. In addition, we need to "one-hot encode the remaining "string"/"object" columns. 5. **Data Quality Issue #5**: > **Temporal Features**: How do we handle year, month, and day? https://github.com/vijaymohire/datascience/blob/main/dataengg/improve_data_quality-Lab%206.ipynb 12 Terraform Deployment Use Terraform to deploy Google resources to servers in US and EU. Create required VPC network, security groups and deploy resources like VM instances, storage etc
  • 15. https://github.com/vijaymohire/gcp/blob/main/Terraform%20for%20GCP%20resources%20deployment%20Demo.pdf Disclaimer: We have sourced the content from various courses and partner trainings. All details, references are for educational purposes only • Google® is trademark of the Google® Corporation. All logos, trademarks and brand names belong to the respective owners as specified. We have no intention to infringe any copyrights or alter related permissions set by the owners. Please refer to source websites for any further details. This is for educational and information purpose only. For more details, contact: Bhadale IT Pvt. Ltd; Email: [email protected]