Project Report G-16
Project Report G-16
1. Introduction
Today there are nearly 2.2 billion people in the world that are visually impaired. Most tasks
rely on optical information; thus, visually impaired people are at a disadvantage. The crucial
information about their surroundings is unavailable. A blind person will always require an aid
or assistance to accompany them or support them with daily tasks. Visual impairment can
have significant effects on an individual's life, including physical, emotional, and social
impacts. Physically, visual impairment can make it challenging to carry out daily tasks such
as reading, writing, and navigating the environment. It can also lead to a decrease in mobility,
falls, and injuries. Several solutions are available to individuals with visual impairment to
help them overcome the challenges they face. These solutions include assistive technology,
rehabilitation services, and education. Assistive technology such as screen readers, location
detection, image detection can help individuals with visual impairment to access information,
communicate with others, and perform daily tasks. The assistance given to those with visual
impairments can now be provided with the use of modern technologies.
So, this virtual assistant will be helpful for visually impaired people. This application
contains an OCR (Optical Character Recognition) Reader and a location detector. The app
will be able to read a specific text to the user as audio and tell them their exact current
location to get all the information about his/her surroundings.
Our application “Drishti” can help to improve the quality of life of visually impaired people.
Since it is not always possible for someone to be with a person who is blind for 24 hours,
This app will prove to be a very useful tool for those who are blind. It will make reading very
simple for them, whether they prefer to read fiction, newspapers, or school textbooks.
There are no barriers ahead. A person of any age can use it, who has a smartphone.
Motivation:
The major goal of building an Android app that serves as a virtual assistant for visually
impaired individuals can be a highly impactful project. By providing a tool that enables
individuals with visual impairments to navigate the world more independently, we can
empower them to live more fulfilling lives. Such an app can offer features like voice
commands, text-to-speech, and location tracking, making it easier for users to perform daily
tasks, communicate with others, and access information. Developing this app can be a way to
leverage technology for the greater good and make a positive difference in people's lives.
One of the most important senses for humans is their ability to see with their eyes, and the
absence of this potential has a profound effect on all the possible decisions a character is
likely to make throughout his or her existence. They frequently experience discrimination
in social structures and at their place of employment because they are not expected to
advance in their profession as much as a person with abilities. So, by organizing
campaigns and offering education with new tools and technologies, the government and
civil society can play a significant role in making the lives of visually impaired people
easier and safer.
There are various apps available on the internet for assisting visually impaired people,
like:
Be My Eye: a free app that connects sighted volunteers with blind or visually impaired
people so they can help them out in their daily lives. Through OpenAl's GPT-4, the app
has created the first-ever virtual volunteer, enabling users to send images and receive
thorough descriptions and instructions for a variety of tasks.
OneStep Reader App: With the touch of a button on the iPhone, the OneStep Reader
converts printed text into clear speech to offer precise, quick, and effective access to both
single-page and multi-page documents.
TapTapSee: The purpose of TapTapSee is to assist the blind and visually impaired in
recognizing objects in their daily environment. Double tapping the screen will enable
users to take pictures from any angle of anything, then the app will speak the
identification to the user.
Cash Reader: People who have visual impairments can quickly and easily identify and
count bills with the help of the Cash Reader app, which speaks the denomination and
instantly recognizes the currency.
Despite the availability of so many apps, a sizable number of visually impaired people are
still unable to benefit from them. This may be due to a lack of knowledge, some apps that
are not free to use, and some only work on iPhones.
Below we can see the basic data flow diagram of virtual assistant apps:
The client will enter their choice, then the application takes in the information. At that
point, the given input is used to perform an activity. The provided information is verified
in a database. If pertinent information is discovered in the input, it is provided to the user
as output or feedback.
System Architecture
The system proposes the following applications:
1. OCR Reader: With the help of this application users can listen to the text from a pdf
by giving voice commands.
2. Location: In this also user can give voice commands for knowing their location and
then the app will give their present location as an output in voice command.
3. Object Detection: With the help of this feature, user can know about the objects
present in their surroundings.
Design of
Location Tracker
Design of Object Detector
1. JAVA
Java is a high-level programming language based on the concepts of object-oriented
programming initially developed by Sun Microsystems and now owned by Oracle
Corporation. It was designed to be platform-independent and portable, meaning that once a
Java program is written, it can run on any computer or device with a Java Virtual Machine
(JVM) installed.
One of the main features of Java is its "write once, run anywhere" philosophy, which allows
developers to create a single codebase that can be used on multiple platforms without the
need for modification. This makes it a popular choice for developing applications that can run
on a variety of operating systems, including Windows, macOS, Linux, and mobile devices
such as Android.
Java also provides a wide range of libraries and tools for developers, making it easier to build
complex applications. It is commonly used in enterprise applications, web development,
mobile app development, and game development.
2. TENSORFLOW
The Google Brain team created the open-source software package TensorFlow for dataflow
and differentiable programming across a variety of workloads. It is a powerful tool for
machine learning and deep learning, especially for building and training neural networks.
TensorFlow allows users to define, optimize, and evaluate mathematical expressions
involving multi-dimensional arrays, also known as tensors. It provides a comprehensive set of
tools and libraries for building and deploying machine learning models, including data
preprocessing, model design, model training, and inference.
TensorFlow has a vast community of developers, researchers, and enthusiasts who contribute
to its development and expansion. It supports multiple programming languages, including
Python, C++, Java, and more, making it accessible to a broad range of developers and
researchers.
For everyone interested in machine learning and deep learning, TensorFlow is a crucial tool.,
whether they are working on research projects or developing commercial applications.
3. ML LIBRARIES
ML: Machine learning (ML) is a subfield of artificial intelligence (AI) that involves the
use of algorithms and statistical models to enable computer systems to learn from data,
identify patterns, and make predictions without being explicitly programmed. It has a
wide range of applications, including image and speech recognition, natural language
processing, recommendation systems, and predictive analytics. ML algorithms can be
supervised, unsupervised, or semi-supervised, and they require large amounts of data to
be trained effectively. As the field continues to grow, new algorithms and techniques are
constantly being developed, making ML an exciting and dynamic area of research and
innovation.
Machine learning (ML) libraries are software tools that enable developers and data
scientists to build and train machine learning models. These libraries provide a set of pre-
built algorithms, functions, and tools that make it easy for developers to implement
complex ML models without having to write extensive code from scratch. Popular ML
libraries include TensorFlow, PyTorch, and Scikit-learn, among others. Each library has
its unique features, advantages, and disadvantages that suit different use cases.
ML Kit Vision APIs, developed by Google, offers a versatile platform for integrating
computer vision capabilities into mobile applications. One notable feature is its Optical
Character Recognition (OCR) module, which provides seamless text recognition
functionalities. This API is designed to simplify complex computer vision tasks, allowing
developers to leverage pre-trained models without extensive machine learning expertise.
In the context of an OCR Reader project, developers can effortlessly integrate ML Kit's
OCR capabilities. By incorporating the necessary dependencies into the project and
initializing the OCR detector, developers gain access to a robust toolset for extracting
textual information from images. This includes support for multiple languages and the
flexibility to choose between on-device or cloud-based processing, offering a balance
between real-time responsiveness and computational efficiency.
Tensorflow Lite Object Detection API, is a framework that enables efficient
deployment of object detection models on mobile and edge devices. It is an extension of
TensorFlow Lite, designed specifically for on-device object detection tasks. The API
allows developers to integrate and run pre-trained models for real-time object detection in
applications, balancing accuracy, and speed for resource-constrained environments.
Single Shot MultiBox Detector (SSD): SSD is a popular object detection framework
that enables the simultaneous prediction of multiple bounding boxes and class scores in a
single forward pass. It operates at different scales to capture objects of varying sizes.
Feature Extractor: MobileNet V1 serves as the feature extractor for the COCO SSD
model. It transforms input images into a set of feature maps, capturing hierarchical
features at different spatial resolutions.
Anchor Boxes: SSD employs anchor boxes at different scales and aspect ratios to
predict bounding boxes efficiently. These anchor boxes serve as reference boxes for
predicting object locations.
Output Layers: The model's output layers provide predictions for bounding box
coordinates and associated class scores. The SSD architecture generates predictions
across multiple scales, contributing to its versatility in detecting objects of various sizes.
COCO (Common Objects in Context) Dataset: The COCO dataset is a widely used
benchmark in computer vision that encompasses a diverse range of object categories. The
COCO SSD MobileNet V1 model is trained on this dataset, enabling it to recognize and
classify a broad spectrum of objects.
2. System Design
System design is the process of designing the architecture, components, and interfaces of a
system to meet the requirements of the end user. Designing a system for a technical interview
cannot be ignored! Almost every IT giant, including Facebook, Amazon, Google, and Apple,
asks a variety of questions in their interviews based on system design concepts such as
scalability, load balancing, caching, and more.
It is a broad field of engineering study that includes a variety of concepts and
principles to help design scalable systems. These concepts are widely requested during
interviews for SDE 2 and SDE 3 positions in various technology companies. These senior
roles require better understanding of how to solve specific design challenges, how to respond
when the system is expected to have more traffic, how to design the system's database, and so
on. All of these decisions must be made carefully, taking into account scalability, reliability,
availability, and maintainability.
Approaching a Design Problem
Breaking Down the Problem: Given a design task, I start breaking it down into
smaller components. These components can be services or functions that the system
must implement. At first, your development system may have a lot of features,
and you don't need to design everything if it's an interview. Ask the interviewer what
features you want to implement in the system.
Communicating your Ideas: Communicate well with the interviewer. Keep him up
to date as you develop your system. Discuss the process out
loud. Visualize your designs on the board using flowcharts and diagrams. Explain to
the interviewer your ideas, how to solve scalability problems, how to design
databases, etc.
Assumptions that make sense: Make some reasonable assumptions when designing
your system. Let's say you need to guess the number of queries your system
will handle per day, the number of database hits per month, or the efficiency level of
your caching system. Here are some numbers to consider when designing. Keep this
number as reasonable as possible. Back up your guesses with some compelling facts
and figures.
There are three main features of a system design:
Reliability in System Design –
A system that can meet the needs of end users is reliable. When designing
a system, you need to plan the implementation of a set of functions and services
in the system. A system can be considered reliable if it can perform all these functions
without wear and tear. A fault-tolerant system is one that can continue
to function reliably in the event of a failure. An error is an error that occurs in one or
another component of the system. Failure does not guarantee system failure. A fault
is a condition in which the system cannot function properly. We can no longer
provide certain services to end users.
The degree of availability varies from system to system. If you're developing a social
networking application, you don't really need high availability. A delay
of several seconds is acceptable. It's not hard to see your
favourite celebrity's Instagram posts with a 5-10 second delay. However, if you
are developing a system for a hospital, data center or banking
institution, you must ensure that the system is highly available. Because
service delays can lead to huge losses.
There are various principles you should follow in order to ensure the availability of
your system:
There should be no single point of failure in the system. Essentially, the system
should not rely on a single service to handle all requests. This is because if
this service is interrupted, the entire system may become corrupted and eventually
become unusable. Detect and eliminate current errors.
There are various box kinds that can be used to create flowcharts. Arrow lines link each of
the many types of boxes to the others. Arrow lines are used to show control flow. Let's
explore each box in brief.
1. Terminal
This oval-shaped box is used to signal the beginning or end of the program. Every
flowchart diagram has two oval shapes, one to represent the beginning of an algorithm
and the other to represent its conclusion.
2. Data
The inputs and outputs are entered into a parallelogram-shaped box. The information
entering the system or algorithm and information leaving the system or algorithm is
essentially depicted like this.
3. Process
The main logic of the algorithm or the major body of the program is written inside this
rectangular box by the programmer. The primary processing codes are written inside this
box, making it the most important part of the flowchart.
4. Decision
This is a rhombus-shaped box, and inside it are control statements like if and conditions
like a > 0. There are two ways to go from this one; one is "yes," and the other is "no."
These are the possibilities in this box, just as there are just two options for any decision:
yes or no.
5. Flow
The algorithm or process's flow is depicted by this arrow line. It stands for the process
flow's direction. Arrows were added to each stage in the examples before to show how
the programs flowed. arrow makes the software easier to read.
6. Delay
3. Software Details
1. Android Studio
Android Studio is the official integrated development environment (IDE) for Android app
development. It is developed by Google and is based on the popular IntelliJ IDEA software.
Android Studio provides a comprehensive suite of tools for developing Android apps,
including a code editor, visual layout editor, debugger, and performance analysis tools. It also
includes a variety of templates and sample code to help developers get started with their
projects quickly.
Some key features of Android Studio include:
A Gradle-based build system that automates the building and packaging of app code
and resources.
A layout editor that allows developers to drag and drop UI components and preview
the design of their app in real-time.
A rich code editor that supports features like code completion, refactoring, and
debugging.
Integration with Google Play services and other libraries, allowing developers to
easily add features like Google Maps, Firebase, and AdMob to their apps.
Support for multiple programming languages, including Kotlin and Java.
Required 8 GB or more
RAM
2. Firebase
Firebase is a mobile and web application development platform owned by Google. It provides
a wide range of services that help developers build, test, and deploy apps more quickly and
easily.
Firebase includes a number of different features, such as real-time database, cloud storage,
authentication, hosting, analytics, and more. These features are designed to work seamlessly
together, allowing developers to create complex applications with ease.
One of the key advantages of Firebase is that it is a serverless platform, meaning that
developers don't have to worry about managing servers or infrastructure. Instead, Firebase
takes care of all the backend services, allowing developers to focus on building the frontend
and user experience of their applications.
Firebase also has a strong community of developers and resources available, including
documentation, code samples, and support forums. This makes it easier for developers to get
started with Firebase and troubleshoot any issues that may arise.
Overall, Firebase is a powerful platform that enables developers to build high-quality mobile
and web applications quickly and easily.
Text-to-speech (TTS) software is a type of computer software that converts written text into
spoken words. It uses natural language processing (NLP) and speech synthesis technology to
convert written text into audio output, which can then be played through speakers or
headphones.
TTS software can be useful for individuals with visual impairments or reading difficulties, as
it allows them to listen to text rather than reading it. It can also be helpful for language
learning or for individuals who prefer listening to reading.
TTS software has come a long way in recent years, with advancements in NLP and machine
learning making it more accurate and natural-sounding. Some TTS software even allows
users to customize the voice and speed of the spoken output, and some can even generate
multiple voices and accents. Overall, TTS software has many practical applications and has
the potential to make information more accessible to a wider range of individuals.
Chapter 4: Implementation Work Details
4. Work Details
Details are important in the workplace because they make a lasting impression on colleagues,
customers, and bosses.
This shows that you are organized and attentive to your responsibilities. Also, the
accuracy and thoroughness of work is a great ways to earn trust and respect. People look for
attentive employees for every good reason.
Sensory perception is the ability to perceive information through the senses. Paying
attention to detail is a skill everyone needs from time to time. When attention to detail
becomes part of your nature, it helps you develop your sensory perception.
It is important to develop sensory perception at work and in life, as attention to detail has
negative consequences.
If you don't pay attention to the details, you won't know what needs to be
fixed or improved. Attention to detail develops sensory skills, helping you better deal with
distractions and focus.
A virtual assistant for visually impaired individuals can be a game-changer in many real-life
applications. For example, in the workplace, a virtual assistant can assist visually impaired
employees by reading out important documents, emails, and messages. This can help them
stay on top of their work and reduce the need for assistance from others. Additionally, a
virtual assistant can track the location of important objects, such as office supplies, and guide
visually impaired individuals to them. This can help improve their efficiency and
independence at work.
In daily life, a virtual assistant can also be incredibly useful for visually impaired individuals.
It can read out labels on food items, medication, and household products, helping them to
identify what they are using or consuming. A virtual assistant can also provide information
about the location of objects within their home, such as keys, wallets, and phones, reducing
the amount of time spent searching for them. Furthermore, a virtual assistant can guide
visually impaired individuals through unfamiliar environments, such as public transportation
systems or airports, ensuring they arrive at their destination safely and on time.
Overall, we can say that a virtual assistant can help visually impaired individuals with
everyday tasks such as shopping and running errands. The assistant can read product labels
and scan barcodes, making it easier for individuals to identify items they need. The virtual
assistant can also track the location of items in stores, making it easier for individuals to
navigate and find what they need. Overall, a virtual assistant can significantly improve the
quality of life for visually impaired individuals, enabling them to be more independent and
self-sufficient.
Chapter 5: Source Code
5. Source Code
In computing, source code is any set of code, with or without comments, written in a human-
readable programming language, usually in plain text. A program's source code is
designed specifically to facilitate the computer programmer's job, primarily in
writing the source code to determine what the computer should do.
AndroidManifest.xml:
Build.gradle:
MODEL AND DATASET:
Chapter 6: Input/ Output Screens
Connecting Physical device through USB: Launched application on phone-
ICON:
The assistant tells that there are three options. We give the required permissions of Camera,
Audio and Microphone.
Reader:
Text is read a loud by the assistant.
Location Tracker:
Object Detector:
Chapter 7: Conclusion
9.1 Limitations
Delayed updates due to differences in time zones.
Due to linguistic and cultural barriers, the briefing may be insufficient, and output can
be decreased.
For individuals who are not tech-savvy and are unfamiliar with smartphones, there
may be obstacles.