2020 International Conference on Intelligent Engineering and Management (ICIEM)
Vision Maker: An Audio Visual And Navigation
Aid For Visually Impaired Person
Sagor Saha Farhan Hossain Shakal Ahmed Mortuza Saleque
Electrical and Electronic Electrical and Electronic Assistant Professor
Engineering Engineering American International University
American International University American International University Bangladesh
Bangladesh Bangladesh Dhaka,Bangladesh
Dhaka,Bangladesh Dhaka,Bangladesh
[email protected] [email protected] [email protected] Jerin Jahan Trisha
Ekectrical and Electronic
Engineering
American International University
Bangladesh
Dhaka,Bangladesh
[email protected] Abstract— People with low vision or complete loss of vision 1.) Is there any obstacle ahead of the person? This is a
face challenging task to meet their daily demand. The barrier vital issue as it concerns the safety of the person.
low vision hinders them participating in the society. The 2.) What objects are being faced by the user? This
evolution of computer vision, artificial intelligence and machine includes recognition of automobiles, household
learning proved to effective tool in revitalizing the situation of
items and utilities.
blind. The propounded design represents implementation of an
assistive device that can aid them in recognizing the object. The 3.) Where does the person want to go? In order to reach
low-cost device runs a pre-trained model destination of the user, the Google map API can
(ssdlite_mobilenet_v2_coco) and can identify up to 80 classes. guide them.
The user can read any text (English) from images. Ultrasonic A lot of research is still being continued regarding the visual
sensors have been used in the custom-made device to and blind vision. In terms of feature and approach few papers
continuously alert the user of obstacles from all directions. have been critically analyzed. K. Patil et al [1] proposed a
Additionally, it can help navigate them in outdoor using Google system, that is to be worn as shoe by the visually impaired
map API. The user can read news, listen to music and mail to people. The shoe contains ultrasonic sensor in all sides,
the member of choice. The command is passed and received
vibration sensor, liquid detector and step-down sensor.
through earphone, which is connected in the audio jack of
raspberry pi. A.kumar et al [4] devised a blind navigation system using
artificial intelligence. The object detected is converted into
Keywords— Object recognition, Human–computer text and fed to ear. K.R et al [5] proposed a smart device for
interaction, outdoor navigation, computer vision, wearable device. visually impaired people. An android application was
developed to provide outdoor navigation. Speech recognition
I. INTRODUCTION engine has been employed to collect weather report, news and
Human eyes perform the rudimentary role in obtaining visual MTC website. M.Maiti et al [6] designed an intelligent
information of the surrounding. Lack of vision creates electronic eye. The proposed device implements range finder
difficult situation for visually impaired people. The primary and CCD camera module for obstacle. Piezo-electric device
causes of vision impairment are mainly glaucoma (2%), un- and solar panel were used to charge up the device. V.K et al
operated cataract (33%) and uncorrected refractive errors: [7] proposed a self-assistive system using IOT for blind and
hyperopia, myopia or astigmatism (43%). Globally around deaf person. Request protocol and Google speech API were
80% of all visual disabilities can be prevented or cured [1]. implemented for converting speech to text and establish
According to World Health Organization (WHO), there are connection. A proto-type design containing Deep learning
285 million of visually impaired people worldwide. Among based object recognition was proposed in the paper along
them, 246 million people are suffering from low or poor with the obstacle avoidance [8,9]. But none of these
vision and 39 million people are completely blind [2]. The implementations demonstrate the continuous obstacle
number is expected to reach double by 2020 [2]. People with warning, object recognition, text identification with optical
sensory disabilities affect 5.3% of the world population for character recognition. Hence, we designed and implemented
audition disability and 9.3% for vision loss, as stated by a device to overcome these issues as well as the implemented
(WHO) [3]. Survey of data from 188 countries prescribes that device which is capable of providing outdoor navigation,
there are no less than 200 million people inclined to severe music system and can get updated to latest news. The
vision impairment. However, the figure is expected to rise up wearable device is shaped like spectacles with acrylic
to more than 550 million by year 2050 [4]. To ease their lives, material. Raspberry pi 3 is used as a processor with 720 pixels
effective solution to few problems should be considered: Logitech camera to run the device. To power up the system a
266
978-1-7281-4097-1/20/$31.00 ©2020 IEEE
Authorized licensed use limited to: UNIVERSIDAD DE OVIEDO. Downloaded on August 07,2020 at 13:34:58 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Intelligent Engineering and Management (ICIEM)
3 celled polymer battery of 2200 mAh is used after converting The electrical setup is shown in the Fig. 3. The propounded
the voltage using buck converter. design required the following components:
II. DESIGN AND IMPLEMENTATION A. Hardware
1) Raspberry pi 3: The single board, raspberry pi
was developed in United Kingdom. This board
uses a Broadcom BCM2837 SoC with 1.2 GHz
64 quad-core ARM Cortex-A53 processor along
with 512 KB shared L2 cache. This model is
equipped with 1 GB RAM, Bluetooth 4.1 (24
Mbit/s) and 2.4 GHz Wi-Fi 802.11n (150 Mbit/s).
It also has 3mm jack for audio and a HDMI port.
2) Camera: As a camera Logitech C270 was picked
because its low price and has video resolution of
1280x720 and image resolution of 640x480. It
has a fixed type of focus and standard lens
technology along with built in microphone.
3) Lithium polymer battery with Buck converter: It
is a rechargeable battery. For supplying the input
power, a three celled lithium polymer battery of
2200mAh was used. A total of 12V was
converted to 5V(volt) and 2A(Ampere).
Fig. 1. System block diagram 4) Ultrasonic Sensor: The ultrasonic sensor has a
transmitter and receiver. The duration of
The device runs on voice command and obstacle sensing reflected wave is measured and the distance is
sensor is always kept activated so that user could be notified calculated. The operating voltage and current are
of the surrounding obstacle. The device requires a constant 5V and 15 mA. The modulation of wave
internet connection for accessing all the features. Fig 1. frequency is 40Hz and bears the capacity to
Shows a system block diagram of the device. As an input, measure the distance of 2cm to 400cm.
voice is passed through the microphone of the earphone.
Camera connected to the processor captures the pictures and
process it according to the command. The output result (text)
is further converted to audio and fed to user via earphone.
Implementation of the proposed device is shown in Fig. 2
Fig. 3. Electrical setup of the device
B. Software
The project was run on python platform with Tensorflow
object detection API. As a model we have selected
ssdlite_mobilenet_v2_coco for object recognition. For
extracting the text from images google open source tesseract
optical character recognition (OCR) is used.
1. Object recognition: For object recognition pre-
trained ssdlite_mobilenet_v2_coco was picked
because it can maintain the balance between speed
and accuracy. Among the available model in table I
it is found to run on highest speed of 27ms.
Moreover, being lightweight it best suits the device.
The pre-trained model can classify up to 80 classes
starting from households’ items to automobile
items. Pre-trained ssdlite_mobilenet_v2_coco: The
SSD architecture is famous convolutional neural
network (CNN) model because it has two
Fig. 2. Hardware implementation for the design components. One is the bounding box predictor and
267
Authorized licensed use limited to: UNIVERSIDAD DE OVIEDO. Downloaded on August 07,2020 at 13:34:58 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Intelligent Engineering and Management (ICIEM)
other is feature extractor. The base network, feature
extractor is generally a truncated classification 4. Indoor path navigation using ultrasonic sensor:
network of VGG-16. The bounding box predictor is For indoor path navigation ultrasonic sensor is used.
a combination of small convolutional filters used to Three ultrasonic sensors were placed in the device.
predict the category score and box of set for default Two sensors were placed in the left and right
bounding boxes. This model is a mobile friendly direction and another one is attached in the center
variant of Single shot detector where regular looking forward. The device gives a voice feed for
convolution in the bounding boxes predictor are the direction if there is any obstacle within 10cm.
supplanted by depth wise convolutions [10]. table II shows the cases where some logic was
created for the indoor path navigation. For example,
TABLE I. COMMON OBJECT IN CONTEXT (COCO) TRAINED MODELS if there is any obstacle in left and right direction the
Model name Speed COCO device through voice feed will guide the user in
(ms) mAP[^1] forward direction as we can see in case 1 from table
ssd_mobilenet_v1_0.75_depth_coco 26 18 II.
ssd_mobilenet_v1_coco 30 21 TABLE II. DESIGNED METHODOLOGY FOR
ssd_mobilenet_v1_ppn_coco 26 20 INDOOR NAVIGATION
Condition Left Sonar Right Forward Direction
ssd_mobilenet_v2_coco 31 22 Sonar Sonar
Case 1 1 1 0 Forward
ssd_mobilenet_v1_fpn_coco 56 32 Case 2 1 0 1 Right
ssd_inception_v2_coco 42 24 Case 3 0 1 1 Left
ssdlite_mobilenet_v2_coco 27 22 Case 4 1 0 0 Right or
mAP= mean Average Precision Forward
Case 5 0 1 0 Left or
2. Optical character recognition: OCR allows the user forward
Case 6 0 0 1 Left or
to read any text written in an image. The open source Right
software named Tesseract is an OCR engine which Case 7 1 1 1 Back
has the capability to recognize more than 100 1= obstacle, 0= No obstacle
languages out of the box. It is highly use in Gmail
spam detection. III. PROJECT IMPLEMENTATION AND COST ANALYSIS
3. Virtual communication: the command passed in the The custom-made device with processor as raspberry pi
receiver is transferred to google API server, which receives commands from the user and turns on the features
transforms voice command into text. Here the voice accordingly. If no command is passed the ultrasonic sensor
command is converted into electrical signal. With placed in three different location keeps warning the user
the help of request procedure protocol, the electrical about the obstacle. For others features like mailing, music
signal enters the server and the encoded Mp3 audio system, outdoor navigation and reading newspaper the device
is then sent to the google API for getting converted requires a constant internet connection. The flow chart in Fig.
into text. Using the same request procedure 5 demonstrates the implementation of the device on human
protocol, the text again goes back to the processor as and Fig. 6 represents the total working process of the device.
shown in Fig 4.
Fig. 5. Implementation of the device on human.
The components altogether cost $96. Although expensive
ways are available for those who can avail them, but the poor
people below poverty line, especially in Bangladesh may not
afford to use expensive devices. With these features, people
Fig. 4. Command signal processing
268
Authorized licensed use limited to: UNIVERSIDAD DE OVIEDO. Downloaded on August 07,2020 at 13:34:58 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Intelligent Engineering and Management (ICIEM)
can afford to buy the device and achieve a pictorial view. The 1. Result from the object recognition section
results for each of the features is given in section IV.
Fig. 7. Object recognition with few items.
Fig. 7 and 8. Represents the recognition of objects with few
Fig. 6. Flow chart of the system represents the working and many objects respectively. This feature is run by by the
process of the device. voice command. The objects recognized are converted into
speech and warned to user by earphone. The pre-trained model
implemented in the device takes 3 to 4 seconds to supply the
TABLE III. TABULATION OF COST OF input image captured and pass through the neural network for
COMPONENTS USED recognition. Whereas the other models listed Table I.
SI Component name Quantity Cost in USD consumes much time compared to
no. ssdlite_mobilenet_v2_coco.
1. Raspberry pi 1 41.22
2. Logitech Camera 1 23.56
3. Ultrasonic sensor 3 2.47
4. Buck converter 1 1.77
5. Lithium polymer 1 21.20
battery(2200mAh)
6. Wires 18 0.12
7. Acrylic board 5.89
Total 96.23
IV. RESULT AND ANALYSIS
The results of all the features are given with critical analysis.
Object recognition using pre-trained model can identify object
with minute details. There is always promising result when the
text written in any image is captured and converted to speech.
The ultrasonic sensor placed in three different position
provide wide knowledge of the surrounding. This feature
when tested provides excellent idea about indoor navigation.
For oudoor navigation, Google Map was used and activated
by voice commands with a certain location. The user get
updated to latest news and play a song of his/her choice with
the command.
Fig. 8. Object recognition with minute details.
269
Authorized licensed use limited to: UNIVERSIDAD DE OVIEDO. Downloaded on August 07,2020 at 13:34:58 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Intelligent Engineering and Management (ICIEM)
2. Result from Optical Character recognition
Pytesseract, an open source software was installed to read text
from images. This software works well on computerized
images. Fig. 9 and 10 represent detection from typed image
and casual image respectively. Result from the both the
section is satisfactory.
Fig. 12. Result of google Map guiding the user.
Fig.11 demonstrates current news headlines which read out
for the user from News API. Fig.12 portrays the guidelines to
the user as run Google Map. Specific command along with
destination has to be passed as an input to the system.
4. Indoor path navigation using ultrasonic sensor
Fig. 9. Recognition from computer typed images.
Fig. 13. Display showing direction to move Right.
Fig. 10. Recognition from casual images.
3. Result from newspaper and Google Map
Fig. 14. Display showing direction to move left or right
V. CONCLUSION
The device was successfully executed and results were
Fig. 11. Result from current newspaper. promising. The cost of the device was reduced with better
features when compared to other solutions. The propounded
device fared well due to the features, accuracy and mobility.
270
Authorized licensed use limited to: UNIVERSIDAD DE OVIEDO. Downloaded on August 07,2020 at 13:34:58 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Intelligent Engineering and Management (ICIEM)
The combination of these features were not presented in any
other work. The recognitions were almost accurate. With
constant Internet connection, the Human-computer interaction
works smoothly along with warning of obstacle from all sides.
The mechanical structure makes device a bit bulky. These
could be optimized by designing a 3d printed body with light
weight material. The device with all features can be expected
to uplift the position of visually impaired people in society.
People with low vision can successfully move around making
use of the sense of speech, sense of hearing and the proposed
device jointly. The problem of indoor and outdoor navigation
can be served with this device.
REFERENCES
[1] T. V. Mataró et al., "An assistive mobile system supporting blind and
visual impaired people when are outdoor," 3rd International Forum on
Research and Technologies for Society and Industry (RTSI), Modena,
2017, pp. 1-6.
[2] K. Patil, Q. Jawadwala and F. C. Shu, "Design and Construction of
Electronic Aid for Visually Impaired People," IEEE Transactions on
Human-Machine Systems, vol. 48, no. 2, pp. 172-182, April 2018.
[3] Francesca Sorgini, Renato Caliò, Maria Chiara Carrozza & Calogero
Maria Oddo “Haptic-assistive technologies for audition and vision
sensory disabilities”, Disability and Rehabilitation: Assistive
Technology, vol. 13, no. 4, pp 394-421, Oct. 2017. Accessed on: Feb.
9, 2020. [Online]. Available doi: 10.1080/17483107.2017.1385100.
[4] Ashwani Kumar, Ankush Chourasia, “Blind Navigation System Using
Artificial Intelligence” International research journal of engineering
and technology (IRJET), India,2018, Vol-5.
[5] R. Kasthuri, B. Nivetha, S. Shabana, M. Veluchamy and S. Sivakumar,
"Smart device for visually impaired people,"Third International
Conference on Science Technology Engineering & Management
(ICONSTEM), Chennai, 2017, pp. 54-59.
[6] M. Maiti, P. Mallick, M. Bagchi, A. Nayek, T. K. Rana and S.
Pramanik, "Intelligent electronic eye for visually impaired
people," 2017 8th Annual Industrial Automation and
Electromechanical Engineering Conference (IEMECON), Bangkok,
2017, pp. 39-42.
[7] K.Vasanth Macharla Mounika & R.Varatharajan,“A Self Assistive
Device for Deaf & Blind People Using IOT,” Journal of Medical
Systems, pp.1-8, Mar 2019.
[8] Mrs.R.Rajalakshmi, Ms.K.Vishnupriya, Ms.M.S.Sathyapriya, Ms.G.R
Vishvaardhini “Smart Navigation System for thye Visually Impaired
using Tensorflow,” International journal of Advance Researchand
Inovative Ideas in Education,Vol 4, pp.1-14,.Feb 2018
[9] Kim, Bongjae & Seo, Hyeontae & Kim, Jeong-Dong.“Design and
Implementation of a Wearable Device for the Blind by Using Deep
Learning Based Object Recognition,”International Conference on
Computer Science and its application, Singapore ,2017, pp.1008-1013
[10] H.Fan et al., “A Real-Time Object Detection Accelerator with
Compressed SSDLite on FPGA,” International Conference on Field-
Programmable Technology(FPT), Naha, Okinawa,Japan, 2018,pp.14-
21.
271
Authorized licensed use limited to: UNIVERSIDAD DE OVIEDO. Downloaded on August 07,2020 at 13:34:58 UTC from IEEE Xplore. Restrictions apply.