Generative AI
Dr.T.Abirami
Associate Professor
Department of Information Technology
Kongu Engineering College
abi.it@kongu.edu
9788654804
Artificial intelligence
• AI enables computers to understand, analyze
data, and make decisions without constant
human guidance.
• These intelligent machines use algorithms,
which are step-by-step instructions, to process
information and improve their performance
over time.
creating machines that can simulate human
intelligence
Real-world Examples of AI Applications
• Voice assistants such as Siri and Alexa or those
helpful chatbots when you’re on websites or
generative AI tools such as ChatGPT and
Google’s Bard — they all use AI technology
uses AI to understand our questions and
commands. They can answer questions
Understanding Machine Learning
focuses on learning
from data
Understanding Machine Learning
Types of Machine Learning
• Supervised learning - To learn from labeled
data.
(Predicting house prices based on features like size and location
(labeled data))
• Unsupervised learning - To find patterns in
unlabeled data.
(Clustering customers into segments based on purchasing behavior
(no labels))
• Reinforcement learning - To learn by interacting
with an environment.
(Teaching a robot to navigate a maze by rewarding it for reaching the
goal and penalizing it for hitting walls)
Comparison
• Supervised Learning is about learning from
known outputs to predict future outcomes.
• Unsupervised Learning focuses on finding
hidden structures in data without any
guidance from labels.
• Reinforcement Learning is about learning
through interaction, where actions are taken
based on feedback from the environment.
Real-life Examples of Supervised Learning
• Email Spam Filtering - Classifying emails as spam or not
spam based on features
• Image Classification - classifying animals, recognizing
handwritten digits, or detecting objects in self-driving cars
• Facial Recognition-security systems or for unlocking
devices
• Financial Fraud Detection- analyzing patterns and
anomalies in financial data.
• Speech Recognition-Converting spoken language into text,
as seen in voice assistants such as Siri or Google Assistant
In reinforcement learning, there is no labelled data like
supervised learning, and agents learn from their
experiences only.
Basics of Deep Learning
Biological Neural Network in Human Brain
• A neuron is the human brain’s most
fundamental cell.
• A human brain has many billions of neurons,
which interact and communicate with one
another, forming a neural network.
• One Neuron = One Feature: In the input layer,
each neuron can represent a single feature of
the dataset.
• Multiple Neurons = Multiple Features: In a
neural network, having multiple neurons
allows the model to process multiple
features at once and learn from them.
Basics of Deep Learning
• neuron is a basic unit that processes input
data. Each neuron receives input, applies a
mathematical function (often called an
activation function), and produces an output.
• A neuron can represent one feature of a
dataset, meaning it processes one aspect of
the input data.
Basics of Deep Learning
Features
• A feature is a measurable property or
characteristic of the data.
• For example, in a dataset about houses,
features could include the size of the house,
the number of bedrooms, and the location.
• Each feature contributes to the information
that the model uses to make predictions.
Example in a Neural Network
• Input Layer: If you have a dataset with three
features (e.g., size, number of bedrooms, and
location), you would have three neurons in the
input layer, each corresponding to one feature.
• Hidden Layers: In the hidden layers, neurons can
combine these features in various ways to learn
complex patterns. Each neuron in these layers can
take inputs from multiple neurons from the previous
layer, allowing the network to learn interactions
between features.
Input Layer (Observation):
Hidden Layers (Processing):
Output Layer (Recognition):
collectively build a more
comprehensive understanding of the
panda’s features
characteristics match those of a panda
Key Components of Basic Neural Network
• Data Loading: MNIST dataset of handwritten digits, applying
transformations to convert images into tensors and normalize them.
• Neural Network Architecture:
– Input Layer: Takes in flattened images of size 28x28 (784 pixels).
– Hidden Layer: Contains 128 neurons with a ReLU activation function.
– Output Layer: Contains 10 neurons (for digits 0-9).
• Loss Function: We use CrossEntropyLoss, which is suitable for multi-class
classification problems.
• Optimizer: Adam optimizer is used to update the model's weights based on
the gradients.
• Training Loop: For each epoch, we perform forward and backward passes,
compute loss, and update weights.
• Evaluation: After training, we evaluate the model's accuracy on the test set.
Deep Neural Networks
• A deep neural network (DNN) is an artificial
neural network (ANN) with multiple layers
between the input and output layers.
Introduction to Generative AI
Learning from existing data patterns
What is Generative AI?
• Generative AI is a type of artificial intelligence
that can create new content, such as text,
images, and music, by learning patterns from
existing data.
• It uses advanced algorithms, like neural
networks, to generate outputs that resemble
human-created content.
• This technology is widely used in various fields,
including art, entertainment, and business.
Introduction to Generative AI
Understand the Basics of Generative AI
• Getting Trained on Data: need to be trained on
massive datasets of existing content. This data can be
retrieved from anything – books, blogs, pictures or
images.
• Recognising Patterns: The algorithm then recognizes
patterns and relationships between various data sets
based on all the retrieved training data.
• Creating Content: Once the model has a good grasp of
the patterns, it can use that knowledge to generate
entirely new content.
Generative Models
• Generative AI uses different types of machine
learning models, called Generative Models.
1. Variational Autoencoders (VAEs),
2. Generative Adversarial Networks (GANs)
3. Limited Boltzmann Machines (RBMs)
4. Transformer-based Language Models
Generative Adversarial Networks (GANs)
• powerful class of machine learning models
used in generative AI.
• It consist of two neural networks, the
Generator and the Discriminator, that work
against each other to produce new data
samples.
GAN is a generative model
• Generator: This network generates new data
samples. It takes random noise as input and
tries to create data that resembles the training
data.
• Discriminator: This network evaluates the
data. It takes both real data samples and the
generated samples and tries to classify them
as real or fake.
Variational Autoencoders (VAEs)
• useful for tasks like image generation, representation
learning, and data compression.
It consists of two main components:
• Encoder: This network compresses the input data into
a smaller representation (latent space). Instead of
producing a single point, it outputs parameters of a
probability distribution (mean and variance).
• Decoder: This network takes samples from the latent
space and reconstructs the original data from that
compressed representation.
Examples of Latent Space
Image Generation:
• Example: In a GAN trained on faces, each point in latent space could
represent a different face with variations in attributes like age,
gender, or expression. Sampling different points produces new,
unique faces.
Text Generation:
• Example: In models like Variational Autoencoders for text, latent
space might encode various styles or themes. For instance, one
region could represent romantic poetry, while another represents
scientific articles.
Music Generation:
• Example: In a music VAE, latent space can represent different musical
styles. Points might correspond to variations in melody, rhythm, or
instrumentation, allowing the generation of new compositions.
What is Latent Space?
• Dimensionality Reduction: In datasets with high
dimensions (like images), latent space reduces the
number of dimensions while preserving important
information.
• Feature Representation: Each point in latent space
represents a unique combination of features. For
example, in a VAE or GAN, points in this space correspond
to different variations of the generated data.
• Sampling and Generation: By sampling points in latent
space, models can generate new data that resembles the
training data but is not identical.
Restricted Boltzmann Machines (RBMs)
• type of generative model used in machine
learning and generative AI.
• They are particularly useful for feature
learning, dimensionality reduction, and
collaborative filtering.
• Example: Image Reconstruction
• Imagine using an RBM to reconstruct images
of handwritten digits
What are Restricted Boltzmann Machines?
An RBM consists of two layers:
• Visible Layer: This layer represents the input
data. Each node corresponds to an observable
feature of the data (e.g., pixels in an image).
• Hidden Layer: This layer captures the
underlying patterns or features in the data. The
nodes in this layer are not directly observed.
Transformer-based language models
• revolutionized natural language processing
(NLP) by enabling powerful and efficient text
generation, understanding, and manipulation.
Types of Transformer-based Language Models
• BERT (Bidirectional Encoder Representations from Transformers):
– Focuses on understanding the context of words in both directions (left and right).
– Primarily used for tasks like sentiment analysis, question answering, and named
entity recognition.
• GPT (Generative Pre-trained Transformer):
– A unidirectional model that generates text by predicting the next word in a
sequence.
– Suitable for text generation, dialogue systems, and creative writing.
• T5 (Text-to-Text Transfer Transformer):
– Treats all NLP tasks as text-to-text tasks, converting inputs into a text format and
generating outputs in a text format.
– Used for translation, summarization, and question answering.
• XLNet:
– Combines ideas from BERT and autoregressive models to capture bidirectional
context while maintaining the ability to generate text.
– Effective for various NLP tasks, including sentiment analysis and language
understanding.
• RoBERTa (Robustly optimized BERT approach):
– An optimized version of BERT with improvements in training techniques and data
handling, enhancing performance on various benchmarks.
Generative AI Tools
• ChatGPT: Content Generation
• Jukebox: Music Creation
• Point-E: 3D Modelling
• RunwayML: Video Creation and Editor
• G3D.ai: Game Development
• LaMDA: Chatbots
• Dall E: Image Creation
• GitHub Copilot: Code Generation
• Midjourney: Art Creation
• Murf AI: Voice Generation
Generative AI Tools
Generative AI Tools
ChatGPT:
• A language model that can generate human-
like text based on prompts, useful for
customer support and content creation.
GPT (Generative Pre-trained Transformer)
• GPT is a transformer-based large language
model, developed by OpenAI. This is the
engine behind ChatGPT.
• The free version of ChatGPT is based on GPT
3.5, while the more advanced GPT-4 based
version, is provided to paid subscribers under
the commercial name “ChatGPT Plus”.
Generative AI Tools
DALL-E:
• An AI that creates images from textual
descriptions, revolutionizing design and
creative industries
Generative AI Tools
Google Bard(Gemini):
• Google Bard Gemini is an advanced AI model
developed by Google, designed to generate
creative and coherent text based on user
prompts. It leverages deep learning
techniques to produce high-quality writing in
various styles and formats, from poetry to
technical writing.
How does it work?
• It uses machine learning models, especially
neural networks, to generate data similar to
its training inputs.
What are common applications?
• Applications include chatbots, content
creation, image generation, and code writing.
04/05/2025 Dr.T.Abirami/Associate Professor /IT/
KEC
53
Natural language query (NLQ)
• NLP is a field of artificial intelligence (AI) that
focuses on the interaction between computers
and humans through natural language.
• It enables machines to understand, interpret,
and generate human language.
• deals with the understanding and generation of
human language.
• In other words, NLP is one way for AI to interact
with humans.
Example:
• A simple example of NLP is a text message
auto-complete feature on your phone, which
predicts what you want to type next based
on your previous messages.
NLP Models
• Rule-Based Models: Use predefined rules for
language processing.
– Example: Simple grammar checkers.
• Statistical Models: Use statistical methods to analyze
language.
– Example: Hidden Markov Models for part-of-speech
tagging.
• Deep Learning Models: Use neural networks for
complex language tasks.
– Example: Transformers, like BERT and GPT.
Applications
• Sentiment Analysis: Determining if a piece of text
expresses positive, negative, or neutral sentiment.
• Chatbots: Automated systems that can converse
with users to answer questions or provide support.
• Machine Translation: Automatically translating text
from one language to another, like Google Translate.
• Text Summarization: Creating concise summaries of
larger text documents.
NLP Tools
• NLTK (Natural Language Toolkit): A popular library for working
with human language data in Python.
– Example: Used for tasks like tokenization, stemming, and tagging.
• SpaCy: An efficient NLP library for advanced natural language
processing.
– Example: Great for named entity recognition and dependency parsing.
• Hugging Face Transformers: A library that provides pre-trained
models for NLP tasks.
– Example: Using BERT for text classification.
• OpenAI's GPT: A powerful language model that can generate text
based on prompts.
– Example: Creating conversational agents or writing assistance tools.
Simple mini project using Generative AI
titled "Text-based Story Generator”
Objective
• Create a program that generates a short story
based on user-provided prompts using a
simple generative AI model.
Step a.
• generator = pipeline('text-generation',
model='gpt2')
Explanation of the Components
pipeline:
• This function is a high-level API that allows users to quickly create a
processing pipeline for a specific task, such as text generation, sentiment
analysis, or translation.
'text-generation':
• This argument specifies the type of task the pipeline will perform. In this
case, it indicates that the pipeline is intended for generating text.
• The model will take an input prompt and generate a continuation or a
response based on that prompt.
model='gpt2':
• This specifies the pre-trained model to be used for the text generation task.
Here, it uses the GPT-2 model, which is a transformer-based model designed
for generating coherent and contextually relevant text.
• GPT-2 was developed by OpenAI and is known for its ability to produce high-
quality text based on the input it receives.
Model Organization Tools Purpose Applications
GPT OpenAI
Transformers,
OpenAI API Text generation
Chatbots,
content
creation
BERT Google Transformers,
TensorFlow, PyTorch
Context
understanding
Sentiment
analysis, QA
T5 Google
Transformers,
TensorFlow, PyTorch
Text-to-text
tasks
Translation,
summarization
XLNet Google Brain,
CMU Transformers Context
understanding
Text
classification,
language
modeling
Turing-
NLG
Microsoft Azure ML, custom
frameworks
Large-scale text
generation
Conversational
AI
GPT-
Neo/GPT-J EleutherAI Transformers
Open-source
text generation
Chatbots,
creative writing
LLaMA Meta PyTorch, Hugging
Face
Efficient model
training
NLP research,
text generation
Claude Anthropic
Custom frameworks,
API
Alignment and
safety
Conversational
agents
Implementation
Step 1: Setting Up the Environment
• need Python and the transformers library
from Hugging Face.
• pip install transformers
def generate_story(prompt, max_length=100):
The function generate_story takes two
parameters:
• prompt: A string input that serves as the
starting point for the story.
• max_length: An optional integer that specifies
the maximum number of tokens (words or
parts of words) to generate. The default value
is set to 100.
story = generator(prompt,max_length=max_length,
num_return_sequences=1)
• This line calls the generator, which is typically a
text generation model initialized earlier (e.g.,
using the Hugging Face Transformers pipeline).
• It generates text based on the prompt, with the
specified max_length. The
num_return_sequences=1 argument indicates
that only one story should be generated.
Hugging Face Transformers
• It is an open-source Python library that
provides access to a vast collection of pre-
trained models for various machine learning
tasks, including natural language processing
(NLP), computer vision, and audio processing
return story[0]['generated_text']
• The function returns the generated story. The
output from the generator is usually a list of
dictionaries, where each dictionary contains a
key 'generated_text' with the generated text
as its value.
• The [0] index accesses the first (and only)
generated story since num_return_sequences
is set to 1.
IDE Tool for python execution
• https://colab.research.google.com/
basic implementation of the story generator:
import random
from transformers import pipeline
# Load the text generation model
generator = pipeline('text-generation', model='gpt2')
def generate_story(prompt, max_length=100):
# Generate a story based on the prompt
story = generator(prompt, max_length=max_length, num_return_sequences=1)
return story[0]['generated_text']
if __name__ == "__main__":
print("Welcome to the Text-based Story Generator!")
user_prompt = input("Enter a prompt for your story: ")
# Generate a story
story = generate_story(user_prompt)
print("nHere is your generated story:n")
print(story)
Explanation of the Code
if __name__ == "__main__"::
• This line checks whether the Python script is being run as
the main program.
• When a Python file is executed, the special variable
__name__ is set to "__main__". If the file is imported as a
module in another file, __name__ is set to the module's
name.
• This conditional allows you to define code that should only
execute when the script is run directly (not when
imported).
from transformers import pipeline
# Initialize the text generation pipeline
generator = pipeline('text-generation', model='gpt2')
# Define the function to generate a story
def generate_story(prompt, max_length=100):
# Generate a story based on the prompt
story = generator(prompt, max_length=max_length, num_return_sequences=1)
return story[0]['generated_text']
# Use the function to generate a story
prompt = "In a small village, there was a mysterious forest"
generated_story = generate_story(prompt)
# Print the generated story
print(generated_story)
• To create a simple program that uses images as input
prompts to generate responses, we'll use a pre-trained
model from the Hugging Face Transformers library. This
example will demonstrate how to use an image captioning
model, which generates textual descriptions based on the
content of the image.
What You Will Learn
• How to use an image as input for a model.
• How to generate text responses based on the image
content.
Simple mini project using Generative AI
Explanation
• Image Input: The program takes an image as
input, which can be a URL or a local file.
• Model Processing: A pre-trained model
processes the image and generates a
descriptive caption.
• Output: The program outputs a natural
language description of the image.
Output : Generated Caption: a cat sitting on
a couch with a pink pillow
Image
Input:
Prerequisites
• Make sure you have Python installed on your
computer. You will also need to install the
following libraries:
• transformers
• torch
• PIL (Python Imaging Library)
1. PIL (Pillow)
Image Loading and Basic Operations:
Loading Images:
• PIL is used to load images from files, and it provides a
convenient Image class for working with image data.
Basic Transformations:
• PIL can be used for basic image transformations like resizing,
cropping, and color adjustments.
Interoperability with PyTorch:
• PIL images can be easily converted to PyTorch tensors, which
are the standard format for numerical operations within
PyTorch.
2. torchvision.transforms
for Preprocessing and Augmentation
Transformations:
• The torchvision.transforms module provides a rich set of image
transformations for preprocessing and data augmentation, such as resizing,
normalization, random cropping, and flipping.
Functional Transforms:
• torchvision.transforms.functional offers fine-grained control over
transformations, allowing for more complex pipelines.
Tensor Input:
• torchvision.transforms can accept PIL images, tensors, or batches of tensors
as input.
Chaining Transforms:
• Transforms can be chained together using torchvision.transforms.Compose.
3. Hugging Face Transformers and Image
Processing:
Image Feature Extractors:
• Hugging Face Transformers provides image feature extractors
(e.g., ViTImageProcessor) that can be used to preprocess
images for specific models.
Model Input:
• These extractors typically take PIL images or tensors as input
and return a format suitable for the model's input.
Data Augmentation:
• You can combine torchvision.transforms with Hugging Face's
image processors to implement data augmentation
strategies.
Step 1 : You can install these libraries using
pip:
pip install transformers torch pillow
Step-by-Step Code
• Import Libraries Start by importing the
necessary libraries.
from transformers import BlipProcessor,
BlipForConditionalGeneration
from PIL import Image
import requests
Load the Pre-Trained Model
• use the BLIP (Bootstrapping Language-Image Pre-
training) model, which is designed for image
captioning.
# Load the processor and model
processor =
BlipProcessor.from_pretrained("Salesforce/blip-image-
captioning-base")
model =
BlipForConditionalGeneration.from_pretrained("Salesfor
ce/blip-image-captioning-base")
Load an Image
• You can load an image from a URL or from your
local directory. For this example, let’s load an
image from a URL.
• # Load an image from a URL
url =
"https://example.com/path/to/your/image.jpg" #
Replace with your image URL
image = Image.open(requests.get(url,
stream=True).raw)
Here are a few sample image URLs you can
use:
A cat:
https://images.unsplash.com/photo-
1518791841217-8f162f1e1131
A landscape:
https://images.unsplash.com/photo-
1506748686214-e9df14d4d9d0
A cityscape:
https://images.unsplash.com/photo-
1521747116042-5a810fda9664
Process the Image
• The processor prepares the image for the
model.
# Process the image
inputs = processor(image, return_tensors="pt")
Generate a Caption
• Use the model to generate a caption based on
the processed image.
# Generate a caption
output = model.generate(**inputs)
caption = processor.decode(output[0],
skip_special_tokens=True)
1. output = model.generate(**inputs)
Purpose: This line generates a response (or caption) based on the input image.
Components:
• model: This refers to the pre-trained image captioning model you loaded
earlier (e.g., BLIP).
• generate(): This is a method (or function) of the model that creates a
caption for the input image.
• **inputs: The double asterisk (**) is a way to unpack a dictionary in Python.
In this case, inputs contains the processed image data that the model needs
to generate a caption.
What Happens: When you call model.generate(**inputs), the model looks at
the image data provided in inputs and produces an output, which is a
sequence of numbers representing the generated caption in a format that the
model understands.
unpacking a dictionary
• In Python, "unpacking a dictionary" refers to the process of
extracting the key-value pairs from a dictionary and using
them as individual arguments in a function or method call.
person = {
"name": "Alice",
"age": 30,
"city": "New York"
}
message = greet(**person) # Unpacking the dictionary
print(message)
2. caption = processor.decode(output[0], skip_special_tokens=True)
Purpose: This line converts the output from the model (which is in numerical
format) into a human-readable string (the actual caption).
Components:
• output[0]: Since the model may return multiple outputs, output[0] refers to the
first (and usually the only) generated caption. It's a list of numbers representing
the caption.
• processor: This is the same processor you used earlier to prepare the image. It
also has a method for decoding the model's output.
• decode(): This method converts the numerical representation of the caption
back into plain text.
• skip_special_tokens=True: This option tells the decoder to ignore any special
tokens (like padding or end-of-sentence markers) that the model uses internally.
This way, you get a clean caption without extra characters.
What Happens: When you call processor.decode(output[0],
skip_special_tokens=True), it takes the numbers from output[0], translates them
into a human-readable caption, and stores that caption in the variable caption.
Print the Result
• Finally, print the generated caption.
# Print the generated caption
print("Generated Caption:", caption)
Complete Code
from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import requests
# Load the processor and model
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
# Load an image from a URL
url = "https://example.com/path/to/your/image.jpg" # Replace with your image URL
image = Image.open(requests.get(url, stream=True).raw)
# Process the image
inputs = processor(image, return_tensors="pt")
# Generate a caption
output = model.generate(**inputs)
caption = processor.decode(output[0], skip_special_tokens=True)
# Print the generated caption
print("Generated Caption:", caption)
References
• https://platform.openai.com/docs/overview
• https://www.youtube.com/watch?v=IRrhpAXi
b-Y
• https://colab.research.google.com/drive/1tIIcs
0qzWaNaQ03dGHKBqY7hLNo0xaRF

Introduction to Generative AI refers to a subset of artificial intelligence

  • 1.
    Generative AI Dr.T.Abirami Associate Professor Departmentof Information Technology Kongu Engineering College [email protected] 9788654804
  • 3.
    Artificial intelligence • AIenables computers to understand, analyze data, and make decisions without constant human guidance. • These intelligent machines use algorithms, which are step-by-step instructions, to process information and improve their performance over time. creating machines that can simulate human intelligence
  • 4.
    Real-world Examples ofAI Applications • Voice assistants such as Siri and Alexa or those helpful chatbots when you’re on websites or generative AI tools such as ChatGPT and Google’s Bard — they all use AI technology uses AI to understand our questions and commands. They can answer questions
  • 6.
  • 7.
  • 8.
    Types of MachineLearning • Supervised learning - To learn from labeled data. (Predicting house prices based on features like size and location (labeled data)) • Unsupervised learning - To find patterns in unlabeled data. (Clustering customers into segments based on purchasing behavior (no labels)) • Reinforcement learning - To learn by interacting with an environment. (Teaching a robot to navigate a maze by rewarding it for reaching the goal and penalizing it for hitting walls)
  • 9.
    Comparison • Supervised Learningis about learning from known outputs to predict future outcomes. • Unsupervised Learning focuses on finding hidden structures in data without any guidance from labels. • Reinforcement Learning is about learning through interaction, where actions are taken based on feedback from the environment.
  • 11.
    Real-life Examples ofSupervised Learning • Email Spam Filtering - Classifying emails as spam or not spam based on features • Image Classification - classifying animals, recognizing handwritten digits, or detecting objects in self-driving cars • Facial Recognition-security systems or for unlocking devices • Financial Fraud Detection- analyzing patterns and anomalies in financial data. • Speech Recognition-Converting spoken language into text, as seen in voice assistants such as Siri or Google Assistant
  • 14.
    In reinforcement learning,there is no labelled data like supervised learning, and agents learn from their experiences only.
  • 15.
    Basics of DeepLearning Biological Neural Network in Human Brain • A neuron is the human brain’s most fundamental cell. • A human brain has many billions of neurons, which interact and communicate with one another, forming a neural network.
  • 16.
    • One Neuron= One Feature: In the input layer, each neuron can represent a single feature of the dataset. • Multiple Neurons = Multiple Features: In a neural network, having multiple neurons allows the model to process multiple features at once and learn from them. Basics of Deep Learning
  • 17.
    • neuron isa basic unit that processes input data. Each neuron receives input, applies a mathematical function (often called an activation function), and produces an output. • A neuron can represent one feature of a dataset, meaning it processes one aspect of the input data. Basics of Deep Learning
  • 18.
    Features • A featureis a measurable property or characteristic of the data. • For example, in a dataset about houses, features could include the size of the house, the number of bedrooms, and the location. • Each feature contributes to the information that the model uses to make predictions.
  • 19.
    Example in aNeural Network • Input Layer: If you have a dataset with three features (e.g., size, number of bedrooms, and location), you would have three neurons in the input layer, each corresponding to one feature. • Hidden Layers: In the hidden layers, neurons can combine these features in various ways to learn complex patterns. Each neuron in these layers can take inputs from multiple neurons from the previous layer, allowing the network to learn interactions between features.
  • 21.
    Input Layer (Observation): HiddenLayers (Processing): Output Layer (Recognition): collectively build a more comprehensive understanding of the panda’s features characteristics match those of a panda
  • 22.
    Key Components ofBasic Neural Network • Data Loading: MNIST dataset of handwritten digits, applying transformations to convert images into tensors and normalize them. • Neural Network Architecture: – Input Layer: Takes in flattened images of size 28x28 (784 pixels). – Hidden Layer: Contains 128 neurons with a ReLU activation function. – Output Layer: Contains 10 neurons (for digits 0-9). • Loss Function: We use CrossEntropyLoss, which is suitable for multi-class classification problems. • Optimizer: Adam optimizer is used to update the model's weights based on the gradients. • Training Loop: For each epoch, we perform forward and backward passes, compute loss, and update weights. • Evaluation: After training, we evaluate the model's accuracy on the test set.
  • 23.
    Deep Neural Networks •A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the input and output layers.
  • 24.
  • 25.
  • 26.
    What is GenerativeAI? • Generative AI is a type of artificial intelligence that can create new content, such as text, images, and music, by learning patterns from existing data. • It uses advanced algorithms, like neural networks, to generate outputs that resemble human-created content. • This technology is widely used in various fields, including art, entertainment, and business.
  • 28.
  • 29.
    Understand the Basicsof Generative AI • Getting Trained on Data: need to be trained on massive datasets of existing content. This data can be retrieved from anything – books, blogs, pictures or images. • Recognising Patterns: The algorithm then recognizes patterns and relationships between various data sets based on all the retrieved training data. • Creating Content: Once the model has a good grasp of the patterns, it can use that knowledge to generate entirely new content.
  • 30.
    Generative Models • GenerativeAI uses different types of machine learning models, called Generative Models. 1. Variational Autoencoders (VAEs), 2. Generative Adversarial Networks (GANs) 3. Limited Boltzmann Machines (RBMs) 4. Transformer-based Language Models
  • 31.
    Generative Adversarial Networks(GANs) • powerful class of machine learning models used in generative AI. • It consist of two neural networks, the Generator and the Discriminator, that work against each other to produce new data samples.
  • 32.
    GAN is agenerative model • Generator: This network generates new data samples. It takes random noise as input and tries to create data that resembles the training data. • Discriminator: This network evaluates the data. It takes both real data samples and the generated samples and tries to classify them as real or fake.
  • 33.
    Variational Autoencoders (VAEs) •useful for tasks like image generation, representation learning, and data compression. It consists of two main components: • Encoder: This network compresses the input data into a smaller representation (latent space). Instead of producing a single point, it outputs parameters of a probability distribution (mean and variance). • Decoder: This network takes samples from the latent space and reconstructs the original data from that compressed representation.
  • 34.
    Examples of LatentSpace Image Generation: • Example: In a GAN trained on faces, each point in latent space could represent a different face with variations in attributes like age, gender, or expression. Sampling different points produces new, unique faces. Text Generation: • Example: In models like Variational Autoencoders for text, latent space might encode various styles or themes. For instance, one region could represent romantic poetry, while another represents scientific articles. Music Generation: • Example: In a music VAE, latent space can represent different musical styles. Points might correspond to variations in melody, rhythm, or instrumentation, allowing the generation of new compositions.
  • 35.
    What is LatentSpace? • Dimensionality Reduction: In datasets with high dimensions (like images), latent space reduces the number of dimensions while preserving important information. • Feature Representation: Each point in latent space represents a unique combination of features. For example, in a VAE or GAN, points in this space correspond to different variations of the generated data. • Sampling and Generation: By sampling points in latent space, models can generate new data that resembles the training data but is not identical.
  • 36.
    Restricted Boltzmann Machines(RBMs) • type of generative model used in machine learning and generative AI. • They are particularly useful for feature learning, dimensionality reduction, and collaborative filtering. • Example: Image Reconstruction • Imagine using an RBM to reconstruct images of handwritten digits
  • 37.
    What are RestrictedBoltzmann Machines? An RBM consists of two layers: • Visible Layer: This layer represents the input data. Each node corresponds to an observable feature of the data (e.g., pixels in an image). • Hidden Layer: This layer captures the underlying patterns or features in the data. The nodes in this layer are not directly observed.
  • 38.
    Transformer-based language models •revolutionized natural language processing (NLP) by enabling powerful and efficient text generation, understanding, and manipulation.
  • 39.
    Types of Transformer-basedLanguage Models • BERT (Bidirectional Encoder Representations from Transformers): – Focuses on understanding the context of words in both directions (left and right). – Primarily used for tasks like sentiment analysis, question answering, and named entity recognition. • GPT (Generative Pre-trained Transformer): – A unidirectional model that generates text by predicting the next word in a sequence. – Suitable for text generation, dialogue systems, and creative writing. • T5 (Text-to-Text Transfer Transformer): – Treats all NLP tasks as text-to-text tasks, converting inputs into a text format and generating outputs in a text format. – Used for translation, summarization, and question answering. • XLNet: – Combines ideas from BERT and autoregressive models to capture bidirectional context while maintaining the ability to generate text. – Effective for various NLP tasks, including sentiment analysis and language understanding. • RoBERTa (Robustly optimized BERT approach): – An optimized version of BERT with improvements in training techniques and data handling, enhancing performance on various benchmarks.
  • 40.
  • 41.
    • ChatGPT: ContentGeneration • Jukebox: Music Creation • Point-E: 3D Modelling • RunwayML: Video Creation and Editor • G3D.ai: Game Development • LaMDA: Chatbots • Dall E: Image Creation • GitHub Copilot: Code Generation • Midjourney: Art Creation • Murf AI: Voice Generation Generative AI Tools
  • 42.
    Generative AI Tools ChatGPT: •A language model that can generate human- like text based on prompts, useful for customer support and content creation.
  • 43.
    GPT (Generative Pre-trainedTransformer) • GPT is a transformer-based large language model, developed by OpenAI. This is the engine behind ChatGPT. • The free version of ChatGPT is based on GPT 3.5, while the more advanced GPT-4 based version, is provided to paid subscribers under the commercial name “ChatGPT Plus”.
  • 44.
    Generative AI Tools DALL-E: •An AI that creates images from textual descriptions, revolutionizing design and creative industries
  • 45.
    Generative AI Tools GoogleBard(Gemini): • Google Bard Gemini is an advanced AI model developed by Google, designed to generate creative and coherent text based on user prompts. It leverages deep learning techniques to produce high-quality writing in various styles and formats, from poetry to technical writing.
  • 46.
    How does itwork? • It uses machine learning models, especially neural networks, to generate data similar to its training inputs.
  • 47.
    What are commonapplications? • Applications include chatbots, content creation, image generation, and code writing.
  • 53.
    04/05/2025 Dr.T.Abirami/Associate Professor/IT/ KEC 53 Natural language query (NLQ) • NLP is a field of artificial intelligence (AI) that focuses on the interaction between computers and humans through natural language. • It enables machines to understand, interpret, and generate human language. • deals with the understanding and generation of human language. • In other words, NLP is one way for AI to interact with humans.
  • 54.
    Example: • A simpleexample of NLP is a text message auto-complete feature on your phone, which predicts what you want to type next based on your previous messages.
  • 55.
    NLP Models • Rule-BasedModels: Use predefined rules for language processing. – Example: Simple grammar checkers. • Statistical Models: Use statistical methods to analyze language. – Example: Hidden Markov Models for part-of-speech tagging. • Deep Learning Models: Use neural networks for complex language tasks. – Example: Transformers, like BERT and GPT.
  • 56.
    Applications • Sentiment Analysis:Determining if a piece of text expresses positive, negative, or neutral sentiment. • Chatbots: Automated systems that can converse with users to answer questions or provide support. • Machine Translation: Automatically translating text from one language to another, like Google Translate. • Text Summarization: Creating concise summaries of larger text documents.
  • 57.
    NLP Tools • NLTK(Natural Language Toolkit): A popular library for working with human language data in Python. – Example: Used for tasks like tokenization, stemming, and tagging. • SpaCy: An efficient NLP library for advanced natural language processing. – Example: Great for named entity recognition and dependency parsing. • Hugging Face Transformers: A library that provides pre-trained models for NLP tasks. – Example: Using BERT for text classification. • OpenAI's GPT: A powerful language model that can generate text based on prompts. – Example: Creating conversational agents or writing assistance tools.
  • 75.
    Simple mini projectusing Generative AI titled "Text-based Story Generator” Objective • Create a program that generates a short story based on user-provided prompts using a simple generative AI model.
  • 76.
    Step a. • generator= pipeline('text-generation', model='gpt2')
  • 77.
    Explanation of theComponents pipeline: • This function is a high-level API that allows users to quickly create a processing pipeline for a specific task, such as text generation, sentiment analysis, or translation. 'text-generation': • This argument specifies the type of task the pipeline will perform. In this case, it indicates that the pipeline is intended for generating text. • The model will take an input prompt and generate a continuation or a response based on that prompt. model='gpt2': • This specifies the pre-trained model to be used for the text generation task. Here, it uses the GPT-2 model, which is a transformer-based model designed for generating coherent and contextually relevant text. • GPT-2 was developed by OpenAI and is known for its ability to produce high- quality text based on the input it receives.
  • 78.
    Model Organization ToolsPurpose Applications GPT OpenAI Transformers, OpenAI API Text generation Chatbots, content creation BERT Google Transformers, TensorFlow, PyTorch Context understanding Sentiment analysis, QA T5 Google Transformers, TensorFlow, PyTorch Text-to-text tasks Translation, summarization XLNet Google Brain, CMU Transformers Context understanding Text classification, language modeling Turing- NLG Microsoft Azure ML, custom frameworks Large-scale text generation Conversational AI GPT- Neo/GPT-J EleutherAI Transformers Open-source text generation Chatbots, creative writing LLaMA Meta PyTorch, Hugging Face Efficient model training NLP research, text generation Claude Anthropic Custom frameworks, API Alignment and safety Conversational agents
  • 79.
    Implementation Step 1: SettingUp the Environment • need Python and the transformers library from Hugging Face. • pip install transformers
  • 80.
    def generate_story(prompt, max_length=100): Thefunction generate_story takes two parameters: • prompt: A string input that serves as the starting point for the story. • max_length: An optional integer that specifies the maximum number of tokens (words or parts of words) to generate. The default value is set to 100.
  • 81.
    story = generator(prompt,max_length=max_length, num_return_sequences=1) •This line calls the generator, which is typically a text generation model initialized earlier (e.g., using the Hugging Face Transformers pipeline). • It generates text based on the prompt, with the specified max_length. The num_return_sequences=1 argument indicates that only one story should be generated.
  • 82.
    Hugging Face Transformers •It is an open-source Python library that provides access to a vast collection of pre- trained models for various machine learning tasks, including natural language processing (NLP), computer vision, and audio processing
  • 83.
    return story[0]['generated_text'] • Thefunction returns the generated story. The output from the generator is usually a list of dictionaries, where each dictionary contains a key 'generated_text' with the generated text as its value. • The [0] index accesses the first (and only) generated story since num_return_sequences is set to 1.
  • 84.
    IDE Tool forpython execution • https://colab.research.google.com/
  • 85.
    basic implementation ofthe story generator: import random from transformers import pipeline # Load the text generation model generator = pipeline('text-generation', model='gpt2') def generate_story(prompt, max_length=100): # Generate a story based on the prompt story = generator(prompt, max_length=max_length, num_return_sequences=1) return story[0]['generated_text'] if __name__ == "__main__": print("Welcome to the Text-based Story Generator!") user_prompt = input("Enter a prompt for your story: ") # Generate a story story = generate_story(user_prompt) print("nHere is your generated story:n") print(story)
  • 86.
    Explanation of theCode if __name__ == "__main__":: • This line checks whether the Python script is being run as the main program. • When a Python file is executed, the special variable __name__ is set to "__main__". If the file is imported as a module in another file, __name__ is set to the module's name. • This conditional allows you to define code that should only execute when the script is run directly (not when imported).
  • 87.
    from transformers importpipeline # Initialize the text generation pipeline generator = pipeline('text-generation', model='gpt2') # Define the function to generate a story def generate_story(prompt, max_length=100): # Generate a story based on the prompt story = generator(prompt, max_length=max_length, num_return_sequences=1) return story[0]['generated_text'] # Use the function to generate a story prompt = "In a small village, there was a mysterious forest" generated_story = generate_story(prompt) # Print the generated story print(generated_story)
  • 88.
    • To createa simple program that uses images as input prompts to generate responses, we'll use a pre-trained model from the Hugging Face Transformers library. This example will demonstrate how to use an image captioning model, which generates textual descriptions based on the content of the image. What You Will Learn • How to use an image as input for a model. • How to generate text responses based on the image content. Simple mini project using Generative AI
  • 89.
    Explanation • Image Input:The program takes an image as input, which can be a URL or a local file. • Model Processing: A pre-trained model processes the image and generates a descriptive caption. • Output: The program outputs a natural language description of the image.
  • 90.
    Output : GeneratedCaption: a cat sitting on a couch with a pink pillow Image Input:
  • 91.
    Prerequisites • Make sureyou have Python installed on your computer. You will also need to install the following libraries: • transformers • torch • PIL (Python Imaging Library)
  • 92.
    1. PIL (Pillow) ImageLoading and Basic Operations: Loading Images: • PIL is used to load images from files, and it provides a convenient Image class for working with image data. Basic Transformations: • PIL can be used for basic image transformations like resizing, cropping, and color adjustments. Interoperability with PyTorch: • PIL images can be easily converted to PyTorch tensors, which are the standard format for numerical operations within PyTorch.
  • 93.
    2. torchvision.transforms for Preprocessingand Augmentation Transformations: • The torchvision.transforms module provides a rich set of image transformations for preprocessing and data augmentation, such as resizing, normalization, random cropping, and flipping. Functional Transforms: • torchvision.transforms.functional offers fine-grained control over transformations, allowing for more complex pipelines. Tensor Input: • torchvision.transforms can accept PIL images, tensors, or batches of tensors as input. Chaining Transforms: • Transforms can be chained together using torchvision.transforms.Compose.
  • 94.
    3. Hugging FaceTransformers and Image Processing: Image Feature Extractors: • Hugging Face Transformers provides image feature extractors (e.g., ViTImageProcessor) that can be used to preprocess images for specific models. Model Input: • These extractors typically take PIL images or tensors as input and return a format suitable for the model's input. Data Augmentation: • You can combine torchvision.transforms with Hugging Face's image processors to implement data augmentation strategies.
  • 95.
    Step 1 :You can install these libraries using pip: pip install transformers torch pillow
  • 96.
    Step-by-Step Code • ImportLibraries Start by importing the necessary libraries. from transformers import BlipProcessor, BlipForConditionalGeneration from PIL import Image import requests
  • 97.
    Load the Pre-TrainedModel • use the BLIP (Bootstrapping Language-Image Pre- training) model, which is designed for image captioning. # Load the processor and model processor = BlipProcessor.from_pretrained("Salesforce/blip-image- captioning-base") model = BlipForConditionalGeneration.from_pretrained("Salesfor ce/blip-image-captioning-base")
  • 98.
    Load an Image •You can load an image from a URL or from your local directory. For this example, let’s load an image from a URL. • # Load an image from a URL url = "https://example.com/path/to/your/image.jpg" # Replace with your image URL image = Image.open(requests.get(url, stream=True).raw)
  • 99.
    Here are afew sample image URLs you can use: A cat: https://images.unsplash.com/photo- 1518791841217-8f162f1e1131 A landscape: https://images.unsplash.com/photo- 1506748686214-e9df14d4d9d0 A cityscape: https://images.unsplash.com/photo- 1521747116042-5a810fda9664
  • 100.
    Process the Image •The processor prepares the image for the model. # Process the image inputs = processor(image, return_tensors="pt")
  • 101.
    Generate a Caption •Use the model to generate a caption based on the processed image. # Generate a caption output = model.generate(**inputs) caption = processor.decode(output[0], skip_special_tokens=True)
  • 102.
    1. output =model.generate(**inputs) Purpose: This line generates a response (or caption) based on the input image. Components: • model: This refers to the pre-trained image captioning model you loaded earlier (e.g., BLIP). • generate(): This is a method (or function) of the model that creates a caption for the input image. • **inputs: The double asterisk (**) is a way to unpack a dictionary in Python. In this case, inputs contains the processed image data that the model needs to generate a caption. What Happens: When you call model.generate(**inputs), the model looks at the image data provided in inputs and produces an output, which is a sequence of numbers representing the generated caption in a format that the model understands.
  • 103.
    unpacking a dictionary •In Python, "unpacking a dictionary" refers to the process of extracting the key-value pairs from a dictionary and using them as individual arguments in a function or method call. person = { "name": "Alice", "age": 30, "city": "New York" } message = greet(**person) # Unpacking the dictionary print(message)
  • 104.
    2. caption =processor.decode(output[0], skip_special_tokens=True) Purpose: This line converts the output from the model (which is in numerical format) into a human-readable string (the actual caption). Components: • output[0]: Since the model may return multiple outputs, output[0] refers to the first (and usually the only) generated caption. It's a list of numbers representing the caption. • processor: This is the same processor you used earlier to prepare the image. It also has a method for decoding the model's output. • decode(): This method converts the numerical representation of the caption back into plain text. • skip_special_tokens=True: This option tells the decoder to ignore any special tokens (like padding or end-of-sentence markers) that the model uses internally. This way, you get a clean caption without extra characters. What Happens: When you call processor.decode(output[0], skip_special_tokens=True), it takes the numbers from output[0], translates them into a human-readable caption, and stores that caption in the variable caption.
  • 105.
    Print the Result •Finally, print the generated caption. # Print the generated caption print("Generated Caption:", caption)
  • 106.
    Complete Code from transformersimport BlipProcessor, BlipForConditionalGeneration from PIL import Image import requests # Load the processor and model processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base") model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base") # Load an image from a URL url = "https://example.com/path/to/your/image.jpg" # Replace with your image URL image = Image.open(requests.get(url, stream=True).raw) # Process the image inputs = processor(image, return_tensors="pt") # Generate a caption output = model.generate(**inputs) caption = processor.decode(output[0], skip_special_tokens=True) # Print the generated caption print("Generated Caption:", caption)
  • 109.