Introduction to Generative AI refers to a subset of artificial intelligence

Generative AI
Dr.T.Abirami
Associate Professor
Department of Information Technology
Kongu Engineering College
abi.it@kongu.edu
9788654804

Artificial intelligence
• AI enables computers to understand, analyze
data, and make decisions without constant
human guidance.
• These intelligent machines use algorithms,
which are step-by-step instructions, to process
information and improve their performance
over time.
creating machines that can simulate human
intelligence

Real-world Examples of AI Applications
• Voice assistants such as Siri and Alexa or those
helpful chatbots when you’re on websites or
generative AI tools such as ChatGPT and
Google’s Bard — they all use AI technology
uses AI to understand our questions and
commands. They can answer questions

Understanding Machine Learning
focuses on learning
from data

Understanding Machine Learning

Types of Machine Learning
• Supervised learning - To learn from labeled
data.
(Predicting house prices based on features like size and location
(labeled data))
• Unsupervised learning - To find patterns in
unlabeled data.
(Clustering customers into segments based on purchasing behavior
(no labels))
• Reinforcement learning - To learn by interacting
with an environment.
(Teaching a robot to navigate a maze by rewarding it for reaching the
goal and penalizing it for hitting walls)

Comparison
• Supervised Learning is about learning from
known outputs to predict future outcomes.
• Unsupervised Learning focuses on finding
hidden structures in data without any
guidance from labels.
• Reinforcement Learning is about learning
through interaction, where actions are taken
based on feedback from the environment.

Real-life Examples of Supervised Learning
• Email Spam Filtering - Classifying emails as spam or not
spam based on features
• Image Classification - classifying animals, recognizing
handwritten digits, or detecting objects in self-driving cars
• Facial Recognition-security systems or for unlocking
devices
• Financial Fraud Detection- analyzing patterns and
anomalies in financial data.
• Speech Recognition-Converting spoken language into text,
as seen in voice assistants such as Siri or Google Assistant

In reinforcement learning, there is no labelled data like
supervised learning, and agents learn from their
experiences only.

Basics of Deep Learning
Biological Neural Network in Human Brain
• A neuron is the human brain’s most
fundamental cell.
• A human brain has many billions of neurons,
which interact and communicate with one
another, forming a neural network.

• One Neuron = One Feature: In the input layer,
each neuron can represent a single feature of
the dataset.
• Multiple Neurons = Multiple Features: In a
neural network, having multiple neurons
allows the model to process multiple
features at once and learn from them.

• neuron is a basic unit that processes input
data. Each neuron receives input, applies a
mathematical function (often called an
activation function), and produces an output.
• A neuron can represent one feature of a
dataset, meaning it processes one aspect of
the input data.

Features
• A feature is a measurable property or
characteristic of the data.
• For example, in a dataset about houses,
features could include the size of the house,
the number of bedrooms, and the location.
• Each feature contributes to the information
that the model uses to make predictions.

Example in a Neural Network
• Input Layer: If you have a dataset with three
features (e.g., size, number of bedrooms, and
location), you would have three neurons in the
input layer, each corresponding to one feature.
• Hidden Layers: In the hidden layers, neurons can
combine these features in various ways to learn
complex patterns. Each neuron in these layers can
take inputs from multiple neurons from the previous
layer, allowing the network to learn interactions
between features.

Input Layer (Observation):
Hidden Layers (Processing):
Output Layer (Recognition):
collectively build a more
comprehensive understanding of the
panda’s features
characteristics match those of a panda

Key Components of Basic Neural Network
• Data Loading: MNIST dataset of handwritten digits, applying
transformations to convert images into tensors and normalize them.
• Neural Network Architecture:
– Input Layer: Takes in flattened images of size 28x28 (784 pixels).
– Hidden Layer: Contains 128 neurons with a ReLU activation function.
– Output Layer: Contains 10 neurons (for digits 0-9).
• Loss Function: We use CrossEntropyLoss, which is suitable for multi-class
classification problems.
• Optimizer: Adam optimizer is used to update the model's weights based on
the gradients.
• Training Loop: For each epoch, we perform forward and backward passes,
compute loss, and update weights.
• Evaluation: After training, we evaluate the model's accuracy on the test set.

Deep Neural Networks
• A deep neural network (DNN) is an artificial
neural network (ANN) with multiple layers
between the input and output layers.

Learning from existing data patterns

What is Generative AI?
• Generative AI is a type of artificial intelligence
that can create new content, such as text,
images, and music, by learning patterns from
existing data.
• It uses advanced algorithms, like neural
networks, to generate outputs that resemble
human-created content.
• This technology is widely used in various fields,
including art, entertainment, and business.

Understand the Basics of Generative AI
• Getting Trained on Data: need to be trained on
massive datasets of existing content. This data can be
retrieved from anything – books, blogs, pictures or
images.
• Recognising Patterns: The algorithm then recognizes
patterns and relationships between various data sets
based on all the retrieved training data.
• Creating Content: Once the model has a good grasp of
the patterns, it can use that knowledge to generate
entirely new content.

Generative Models
• Generative AI uses different types of machine
learning models, called Generative Models.
1. Variational Autoencoders (VAEs),
2. Generative Adversarial Networks (GANs)
3. Limited Boltzmann Machines (RBMs)
4. Transformer-based Language Models

Generative Adversarial Networks (GANs)
• powerful class of machine learning models
used in generative AI.
• It consist of two neural networks, the
Generator and the Discriminator, that work
against each other to produce new data
samples.

GAN is a generative model
• Generator: This network generates new data
samples. It takes random noise as input and
tries to create data that resembles the training
data.
• Discriminator: This network evaluates the
data. It takes both real data samples and the
generated samples and tries to classify them
as real or fake.

Variational Autoencoders (VAEs)
• useful for tasks like image generation, representation
learning, and data compression.
It consists of two main components:
• Encoder: This network compresses the input data into
a smaller representation (latent space). Instead of
producing a single point, it outputs parameters of a
probability distribution (mean and variance).
• Decoder: This network takes samples from the latent
space and reconstructs the original data from that
compressed representation.

Examples of Latent Space
Image Generation:
• Example: In a GAN trained on faces, each point in latent space could
represent a different face with variations in attributes like age,
gender, or expression. Sampling different points produces new,
unique faces.
Text Generation:
• Example: In models like Variational Autoencoders for text, latent
space might encode various styles or themes. For instance, one
region could represent romantic poetry, while another represents
scientific articles.
Music Generation:
• Example: In a music VAE, latent space can represent different musical
styles. Points might correspond to variations in melody, rhythm, or
instrumentation, allowing the generation of new compositions.

What is Latent Space?
• Dimensionality Reduction: In datasets with high
dimensions (like images), latent space reduces the
number of dimensions while preserving important
information.
• Feature Representation: Each point in latent space
represents a unique combination of features. For
example, in a VAE or GAN, points in this space correspond
to different variations of the generated data.
• Sampling and Generation: By sampling points in latent
space, models can generate new data that resembles the
training data but is not identical.

Restricted Boltzmann Machines (RBMs)
• type of generative model used in machine
learning and generative AI.
• They are particularly useful for feature
learning, dimensionality reduction, and
collaborative filtering.
• Example: Image Reconstruction
• Imagine using an RBM to reconstruct images
of handwritten digits

What are Restricted Boltzmann Machines?
An RBM consists of two layers:
• Visible Layer: This layer represents the input
data. Each node corresponds to an observable
feature of the data (e.g., pixels in an image).
• Hidden Layer: This layer captures the
underlying patterns or features in the data. The
nodes in this layer are not directly observed.

Transformer-based language models
• revolutionized natural language processing
(NLP) by enabling powerful and efficient text
generation, understanding, and manipulation.

Types of Transformer-based Language Models
• BERT (Bidirectional Encoder Representations from Transformers):
– Focuses on understanding the context of words in both directions (left and right).
– Primarily used for tasks like sentiment analysis, question answering, and named
entity recognition.
• GPT (Generative Pre-trained Transformer):
– A unidirectional model that generates text by predicting the next word in a
sequence.
– Suitable for text generation, dialogue systems, and creative writing.
• T5 (Text-to-Text Transfer Transformer):
– Treats all NLP tasks as text-to-text tasks, converting inputs into a text format and
generating outputs in a text format.
– Used for translation, summarization, and question answering.
• XLNet:
– Combines ideas from BERT and autoregressive models to capture bidirectional
context while maintaining the ability to generate text.
– Effective for various NLP tasks, including sentiment analysis and language
understanding.
• RoBERTa (Robustly optimized BERT approach):
– An optimized version of BERT with improvements in training techniques and data
handling, enhancing performance on various benchmarks.

• ChatGPT: Content Generation
• Jukebox: Music Creation
• Point-E: 3D Modelling
• RunwayML: Video Creation and Editor
• G3D.ai: Game Development
• LaMDA: Chatbots
• Dall E: Image Creation
• GitHub Copilot: Code Generation
• Midjourney: Art Creation
• Murf AI: Voice Generation
Generative AI Tools

Generative AI Tools
ChatGPT:
• A language model that can generate human-
like text based on prompts, useful for
customer support and content creation.

GPT (Generative Pre-trained Transformer)
• GPT is a transformer-based large language
model, developed by OpenAI. This is the
engine behind ChatGPT.
• The free version of ChatGPT is based on GPT
3.5, while the more advanced GPT-4 based
version, is provided to paid subscribers under
the commercial name “ChatGPT Plus”.

Generative AI Tools
DALL-E:
• An AI that creates images from textual
descriptions, revolutionizing design and
creative industries

Generative AI Tools
Google Bard(Gemini):
• Google Bard Gemini is an advanced AI model
developed by Google, designed to generate
creative and coherent text based on user
prompts. It leverages deep learning
techniques to produce high-quality writing in
various styles and formats, from poetry to
technical writing.

How does it work?
• It uses machine learning models, especially
neural networks, to generate data similar to
its training inputs.

What are common applications?
• Applications include chatbots, content
creation, image generation, and code writing.

04/05/2025 Dr.T.Abirami/Associate Professor /IT/
KEC
53
Natural language query (NLQ)
• NLP is a field of artificial intelligence (AI) that
focuses on the interaction between computers
and humans through natural language.
• It enables machines to understand, interpret,
and generate human language.
• deals with the understanding and generation of
human language.
• In other words, NLP is one way for AI to interact
with humans.

Example:
• A simple example of NLP is a text message
auto-complete feature on your phone, which
predicts what you want to type next based
on your previous messages.

NLP Models
• Rule-Based Models: Use predefined rules for
language processing.
– Example: Simple grammar checkers.
• Statistical Models: Use statistical methods to analyze
language.
– Example: Hidden Markov Models for part-of-speech
tagging.
• Deep Learning Models: Use neural networks for
complex language tasks.
– Example: Transformers, like BERT and GPT.

Applications
• Sentiment Analysis: Determining if a piece of text
expresses positive, negative, or neutral sentiment.
• Chatbots: Automated systems that can converse
with users to answer questions or provide support.
• Machine Translation: Automatically translating text
from one language to another, like Google Translate.
• Text Summarization: Creating concise summaries of
larger text documents.

NLP Tools
• NLTK (Natural Language Toolkit): A popular library for working
with human language data in Python.
– Example: Used for tasks like tokenization, stemming, and tagging.
• SpaCy: An efficient NLP library for advanced natural language
processing.
– Example: Great for named entity recognition and dependency parsing.
• Hugging Face Transformers: A library that provides pre-trained
models for NLP tasks.
– Example: Using BERT for text classification.
• OpenAI's GPT: A powerful language model that can generate text
based on prompts.
– Example: Creating conversational agents or writing assistance tools.

Simple mini project using Generative AI
titled "Text-based Story Generator”
Objective
• Create a program that generates a short story
based on user-provided prompts using a
simple generative AI model.

Step a.
• generator = pipeline('text-generation',
model='gpt2')

Explanation of the Components
pipeline:
• This function is a high-level API that allows users to quickly create a
processing pipeline for a specific task, such as text generation, sentiment
analysis, or translation.
'text-generation':
• This argument specifies the type of task the pipeline will perform. In this
case, it indicates that the pipeline is intended for generating text.
• The model will take an input prompt and generate a continuation or a
response based on that prompt.
model='gpt2':
• This specifies the pre-trained model to be used for the text generation task.
Here, it uses the GPT-2 model, which is a transformer-based model designed
for generating coherent and contextually relevant text.
• GPT-2 was developed by OpenAI and is known for its ability to produce high-
quality text based on the input it receives.

Model Organization Tools Purpose Applications
GPT OpenAI
Transformers,
OpenAI API Text generation
Chatbots,
content
creation
BERT Google Transformers,
TensorFlow, PyTorch
Context
understanding
Sentiment
analysis, QA
T5 Google
Transformers,
TensorFlow, PyTorch
Text-to-text
tasks
Translation,
summarization
XLNet Google Brain,
CMU Transformers Context
understanding
Text
classification,
language
modeling
Turing-
NLG
Microsoft Azure ML, custom
frameworks
Large-scale text
generation
Conversational
AI
GPT-
Neo/GPT-J EleutherAI Transformers
Open-source
text generation
Chatbots,
creative writing
LLaMA Meta PyTorch, Hugging
Face
Efficient model
training
NLP research,
text generation
Claude Anthropic
Custom frameworks,
API
Alignment and
safety
Conversational
agents

Implementation
Step 1: Setting Up the Environment
• need Python and the transformers library
from Hugging Face.
• pip install transformers

def generate_story(prompt, max_length=100):
The function generate_story takes two
parameters:
• prompt: A string input that serves as the
starting point for the story.
• max_length: An optional integer that specifies
the maximum number of tokens (words or
parts of words) to generate. The default value
is set to 100.

story = generator(prompt,max_length=max_length,
num_return_sequences=1)
• This line calls the generator, which is typically a
text generation model initialized earlier (e.g.,
using the Hugging Face Transformers pipeline).
• It generates text based on the prompt, with the
specified max_length. The
num_return_sequences=1 argument indicates
that only one story should be generated.

Hugging Face Transformers
• It is an open-source Python library that
provides access to a vast collection of pre-
trained models for various machine learning
tasks, including natural language processing
(NLP), computer vision, and audio processing

return story[0]['generated_text']
• The function returns the generated story. The
output from the generator is usually a list of
dictionaries, where each dictionary contains a
key 'generated_text' with the generated text
as its value.
• The [0] index accesses the first (and only)
generated story since num_return_sequences
is set to 1.

IDE Tool for python execution
• https://colab.research.google.com/

basic implementation of the story generator:
import random
from transformers import pipeline
# Load the text generation model
generator = pipeline('text-generation', model='gpt2')
# Generate a story based on the prompt
story = generator(prompt, max_length=max_length, num_return_sequences=1)
if __name__ == "__main__":
print("Welcome to the Text-based Story Generator!")
user_prompt = input("Enter a prompt for your story: ")
# Generate a story
story = generate_story(user_prompt)
print("nHere is your generated story:n")
print(story)

Explanation of the Code
if __name__ == "__main__"::
• This line checks whether the Python script is being run as
the main program.
• When a Python file is executed, the special variable
__name__ is set to "__main__". If the file is imported as a
module in another file, __name__ is set to the module's
name.
• This conditional allows you to define code that should only
execute when the script is run directly (not when
imported).

from transformers import pipeline
# Initialize the text generation pipeline
generator = pipeline('text-generation', model='gpt2')
# Define the function to generate a story
# Generate a story based on the prompt
story = generator(prompt, max_length=max_length, num_return_sequences=1)
# Use the function to generate a story
prompt = "In a small village, there was a mysterious forest"
generated_story = generate_story(prompt)
# Print the generated story
print(generated_story)

• To create a simple program that uses images as input
prompts to generate responses, we'll use a pre-trained
model from the Hugging Face Transformers library. This
example will demonstrate how to use an image captioning
model, which generates textual descriptions based on the
content of the image.
What You Will Learn
• How to use an image as input for a model.
• How to generate text responses based on the image
content.
Simple mini project using Generative AI

Explanation
• Image Input: The program takes an image as
input, which can be a URL or a local file.
• Model Processing: A pre-trained model
processes the image and generates a
descriptive caption.
• Output: The program outputs a natural
language description of the image.

Output : Generated Caption: a cat sitting on
a couch with a pink pillow
Image
Input:

Prerequisites
• Make sure you have Python installed on your
computer. You will also need to install the
following libraries:
• transformers
• torch
• PIL (Python Imaging Library)

1. PIL (Pillow)
Image Loading and Basic Operations:
Loading Images:
• PIL is used to load images from files, and it provides a
convenient Image class for working with image data.
Basic Transformations:
• PIL can be used for basic image transformations like resizing,
cropping, and color adjustments.
Interoperability with PyTorch:
• PIL images can be easily converted to PyTorch tensors, which
are the standard format for numerical operations within
PyTorch.

2. torchvision.transforms
for Preprocessing and Augmentation
Transformations:
• The torchvision.transforms module provides a rich set of image
transformations for preprocessing and data augmentation, such as resizing,
normalization, random cropping, and flipping.
Functional Transforms:
• torchvision.transforms.functional offers fine-grained control over
transformations, allowing for more complex pipelines.
Tensor Input:
• torchvision.transforms can accept PIL images, tensors, or batches of tensors
as input.
Chaining Transforms:
• Transforms can be chained together using torchvision.transforms.Compose.

3. Hugging Face Transformers and Image
Processing:
Image Feature Extractors:
• Hugging Face Transformers provides image feature extractors
(e.g., ViTImageProcessor) that can be used to preprocess
images for specific models.
Model Input:
• These extractors typically take PIL images or tensors as input
and return a format suitable for the model's input.
Data Augmentation:
• You can combine torchvision.transforms with Hugging Face's
image processors to implement data augmentation
strategies.

Step 1 : You can install these libraries using
pip:
pip install transformers torch pillow

Step-by-Step Code
• Import Libraries Start by importing the
necessary libraries.
from transformers import BlipProcessor,
BlipForConditionalGeneration
from PIL import Image
import requests

Load the Pre-Trained Model
• use the BLIP (Bootstrapping Language-Image Pre-
training) model, which is designed for image
captioning.
# Load the processor and model
processor =
BlipProcessor.from_pretrained("Salesforce/blip-image-
captioning-base")
model =
BlipForConditionalGeneration.from_pretrained("Salesfor
ce/blip-image-captioning-base")

Load an Image
• You can load an image from a URL or from your
local directory. For this example, let’s load an
image from a URL.
• # Load an image from a URL
url =
"https://example.com/path/to/your/image.jpg" #
Replace with your image URL
image = Image.open(requests.get(url,
stream=True).raw)

Here are a few sample image URLs you can
use:
A cat:
https://images.unsplash.com/photo-
1518791841217-8f162f1e1131
A landscape:
1506748686214-e9df14d4d9d0
A cityscape:
1521747116042-5a810fda9664

Process the Image
• The processor prepares the image for the
model.
# Process the image
inputs = processor(image, return_tensors="pt")

Generate a Caption
• Use the model to generate a caption based on
the processed image.
# Generate a caption
output = model.generate(**inputs)
caption = processor.decode(output[0],
skip_special_tokens=True)

1. output = model.generate(**inputs)
Purpose: This line generates a response (or caption) based on the input image.
Components:
• model: This refers to the pre-trained image captioning model you loaded
earlier (e.g., BLIP).
• generate(): This is a method (or function) of the model that creates a
caption for the input image.
• **inputs: The double asterisk (**) is a way to unpack a dictionary in Python.
In this case, inputs contains the processed image data that the model needs
to generate a caption.
What Happens: When you call model.generate(**inputs), the model looks at
the image data provided in inputs and produces an output, which is a
sequence of numbers representing the generated caption in a format that the
model understands.

unpacking a dictionary
• In Python, "unpacking a dictionary" refers to the process of
extracting the key-value pairs from a dictionary and using
them as individual arguments in a function or method call.
person = {
"name": "Alice",
"age": 30,
"city": "New York"
}
message = greet(**person) # Unpacking the dictionary
print(message)

2. caption = processor.decode(output[0], skip_special_tokens=True)
Purpose: This line converts the output from the model (which is in numerical
format) into a human-readable string (the actual caption).
Components:
• output[0]: Since the model may return multiple outputs, output[0] refers to the
first (and usually the only) generated caption. It's a list of numbers representing
the caption.
• processor: This is the same processor you used earlier to prepare the image. It
also has a method for decoding the model's output.
• decode(): This method converts the numerical representation of the caption
back into plain text.
• skip_special_tokens=True: This option tells the decoder to ignore any special
tokens (like padding or end-of-sentence markers) that the model uses internally.
This way, you get a clean caption without extra characters.
What Happens: When you call processor.decode(output[0],
skip_special_tokens=True), it takes the numbers from output[0], translates them
into a human-readable caption, and stores that caption in the variable caption.

Print the Result
• Finally, print the generated caption.
# Print the generated caption
print("Generated Caption:", caption)

Complete Code
from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import requests
# Load the processor and model
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
# Load an image from a URL
url = "https://example.com/path/to/your/image.jpg" # Replace with your image URL
image = Image.open(requests.get(url, stream=True).raw)
# Process the image
inputs = processor(image, return_tensors="pt")
# Generate a caption
output = model.generate(**inputs)
caption = processor.decode(output[0], skip_special_tokens=True)
# Print the generated caption
print("Generated Caption:", caption)

References
• https://platform.openai.com/docs/overview
• https://www.youtube.com/watch?v=IRrhpAXi
b-Y
• https://colab.research.google.com/drive/1tIIcs
0qzWaNaQ03dGHKBqY7hLNo0xaRF

Introduction to Generative AI refers to a subset of artificial intelligence

More Related Content

What's hot(19)

Similar to Introduction to Generative AI refers to a subset of artificial intelligence(20)

More from Kongu Engineering College, Perundurai, Erode(20)

Recently uploaded(20)

Introduction to Generative AI refers to a subset of artificial intelligence