SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 04 | Apr 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 422
Temporal Reasoning Graph for Activity Recognition
ABSTRACT: In this work, we expect to resolve the issue of
human cooperation acknowledgment in recordings by
investigating the long haul between related elements among
various people. As of late, Long Short-Term Memory (LSTM)
has gotten a well-known decision to display singular
dynamic for single-individual activity acknowledgment
because of its capacity to catch the transient movement data
in a reach. In any case, most existing LSTM-based techniques
center just on catching the elements of human collaboration
by basically consolidating all elements of people or
demonstrating them in general. Such strategies disregard
the between related elements of how human cooperation’s
change after some time. To this end, we propose a novel
various leveled Long Short-Term Concurrent Memory (H-
LSTCM) to model the drawn out between related elements
among a gathering of people for perceiving human
connections. In particular, we first feed every individual's
.Static highlights into a Single-Person LSTM to show the
single-individual dynamic. Consequently, at one time step,
the yields of all Single-Person LSTM units are taken care of
into a novel Concurrent LSTM (Co-LSTM) unit, which
predominantly comprises of numerous sub-memory units,
another cell door, furthermore, another co-memory cell. In
the Co-LSTM unit, each sub-memory unit stores singular
movement data, while this Co-LSTM unit specifically
coordinates and stores between related movement data
between different communicating people from various sub-
memory units by means of the cell door and co-memory cell,
individually. Broad investigations on a few public datasets
approve the viability of the proposed H-LSTCM by
contrasting against standard and cutting edge strategies.
KEYWORDS: Convolutional neural networks (CNNs),
LSTM, Human activity
I. INTRODUCTION
1.1 Background
HUMAN communications (e.g. hand-shaking, and talking)
are run of the mill human exercises that happen out in the
open places and are pulling in significant consideration
from specialists. A human collaboration generally includes
in any event two individual elements from various people,
who are simultaneously between related with one another
(e.g., a few people are talking together, a few people are
handshaking with one another). Much of the time of
human communication, the simultaneous interrelated
elements between various people are unequivocally
connecting (e.g., individual A kicks individual B, while
individual B withdraws back). It has been shown that the
simultaneous between related elements among various
people, instead of single-individual elements, can
contribute discriminative data for perceiving human
communications.
Having described the available data and the
possible difficulties, we can now renew the questions to
investigate in this project. First and foremost, we are
interested in seeing if we can predict the criminal
incidents, perhaps for a specific type of crime, for a small
time frame and geographic region. Second, we are
interested in learning which features have the most
predictive power with respect to crime. Having an
understanding of driving factors, cities can better work to
mitigate the risk factors for crime.
In human associations, exercises have a hidden purpose.
This reason can be to achieve an objective, or to respond to
some improvement. Both of these boundaries are
governed by the climate of the people, which directs the
logical components in the scene. Since this environment is
shared by all people present in the scene, it is frequently
the situation that the activities of people are
interdependent and some coherency between these
activities may exist. We call such exercises "aggregate".
Instances of aggregate exercises are: Crossing the street,
Talking, Waiting, Queuing, Walking, Dancing and Jogging.
1.2 Motivation
 In human associations, exercises have a hidden
purpose. This reason can be to achieve an
objective, or to respond to some improvement.
 Both of these boundaries are governed by the
climate of the people, which directs the logical
components in the scene.
Sonali Chandole, Sajay Pingat
Department of Computer Engineering, Smt. Kashibai Navale College of Engineering, Pune, India
----------------------------------------------------------------------------***---------------------------------------------------------------------
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 04 | Apr 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 423
 Activity recognition in videos has attracted
increasing interest recently. An activity can be
defined as a certain spatial and temporal pattern
involving the movements of a single or multiple
actors.
1.3 Objectives
 To develop a system that recommends an
appropriate human activity based to its users
based on the lengthy videos.
 It fulfills the human activity based its video of the
user by taking various inputs and generating the
menu accordingly.
1] To collect Datasets of activity videos.
2] Implementation of CNN algorithm
3] Validation of Proposed Algorithm.
II. LITERATURE SURVEY:
The paper is written by. Kong, Y. Jia et al. [1] to
comprehend human to human managing precisely, human
collaboration acknowledgment (HIR) frameworks require
strong component extraction and choice strategies
dependent on vision sensors. In this paper, we have
proposed WHITE STAG model to astutely follow human
cooperation’s utilizing space time strategies just as shape
based precise mathematical consecutive methodologies
over full-body outlines and skeleton joints, separately.
After highlight extraction, include space is diminished by
utilizing codebook age and direct discriminant
investigation (LDA). At long last, part sliding perceptron is
utilized to perceive various classes of human connections.
The proposed WHITE STAG model is approved utilizing
two openly accessible RGB datasets and one self-clarified
power intuitive dataset as oddity. For assessment, four
examinations are performed utilizing forget about one and
cross approval testing plans. Our WHITE STAG model and
bit sliding perceptron beat the current notable measurable
best in class strategies by accomplishing a weighted
normal acknowledgment pace of 87.48% more than UT-
Interaction, 87.5% more than BIT-Interaction and 85.7%
over proposed IM-IntensityInteractive7 datasets. The
proposed framework ought to be pertinent to different
sight and sound substance and security applications like
reconnaissance frameworks, video based learning, clinical
futurists, administration robots, and intelligent gaming.
X. Chang, W.-S. Zheng et al. [2] stated that present a
framework for the recognition of collective human
activities. A collective activity is defined or reinforced by
the existence of coherent behavior of individuals in time
and space. We call such coherent behavior ‘Crowd
Context’. Examples of collective activities are “queuing in a
line” or “talking”. Following [7], we propose to recognize
collective activities using the crowd context and introduce
a new scheme for learning it automatically. Our scheme is
constructed upon a Random Forest structure which
randomly samples variable volume spatiotemporal regions
to pick the most discriminating at-tributes for
classification. Unlike previous approaches, our algorithm
automatically finds the optimal configuration of
spatiotemporal bins, over which to sample the evidence,
by randomization. This enables a methodology for
modelling crowd context. We employ a 3D Markov
Random Field to regularize the classification and localize
collective activities in the scene. We demonstrate the
flexibility and scalability of the proposed framework in a
number of experiments and show that our method
outperforms state-of-the art action classification
techniques.
Y. Kong and Y. Fu et al. [3] proposed that the Author Index
contains the primary entry for each item, listed under the
first author’s name. The primary entry includes the co-
authors’ names, the title of the paper or other item, and its
location, specified by the publication abbreviation, year,
month, and inclusive pagination. The Subject Index
contains entries describing the item under all appropriate
subject headings, plus the first author’s name, the
publication abbreviation, month, and year, and inclusive
pages. Note that the item title is found only under the
primary entry in the Author Index.
Y. Zhang, X. Liu et al. [4] stated that the local feature
based approaches have become popular for activity
recognition. A local feature captures the local movement
and appearance of a local region in a video, and thus can
be ambiguous; e.g., it cannot tell whether a movement is
from a person’s hand or foot, when the camera is far
away from the person. To better distinguish different
types of activities, people have proposed using the
combination of local features to encode the relationships
of local movements. Due to the computation limit,
previous work only creates a combination from
neighboring features in space and/or time. In this paper,
we propose an approach that efficiently identifies both
local and long-range motion interactions; taking the
“push” activity as an example, our approach can capture
the combination of the hand movement of one person
and the foot response of another person, the local
features of which are both spatially and temporally far
away from each other. Our computational complexity is
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 04 | Apr 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 424
in linear time to the number of local features in a video.
The extensive experiments show that our approach is
generically effective for recognizing a wide variety of
activities and activities spanning a long term, compared
to a number of state-of-the-art methods.
J. Donahue, L. Anne Hendricks et al. [5] proposed that the
models based on deep convolutional networks have
dominated recent image interpretation tasks; we
investigate whether models which are also recurrent, or
"temporally deep", are effective for tasks involving
sequences, visual and otherwise. We develop a novel
recurrent convolutional architecture suitable for large-
scale visual learning which is end-to-end trainable, and
demonstrate the value of these models on benchmark
video recognition tasks, image description and retrieval
problems, and video narration challenges. In contrast to
current models which assume a fixed spatio-temporal
receptive field or simple temporal averaging for sequential
processing, recurrent convolutional models are "doubly
deep"' in that they can be compositional in spatial and
temporal "layers". Such models may have advantages
when target concepts are complex and/or training data are
limited. Learning long-term dependencies is possible when
nonlinearities are incorporated into the network state
updates. Long-term RNN models are appealing in that they
directly can map variable-length inputs (e.g., video frames)
to variable length outputs (e.g., natural language text) and
can model complex temporal dynamics; yet they can be
optimized with back propagation. Our recurrent long-term
models are directly connected to modern visual convent
models and can be jointly trained to simultaneously learn
temporal dynamics and convolutional perceptual
representations. Our results show such models have
distinct advantages over state-of-the-art models for
recognition or generation which are separately defined
and/or optimized.
Q. Ke, M. Bennamoun, S. An, et al. [6] stated that the human
interaction prediction, i.e., the recognition of an ongoing
interaction activity before it is completely executed, has a
wide range of applications such as human-robot
interaction and the prevention of dangerous events. Due to
the large variations in appearance and the evolution of
scenes, interaction prediction at an early stage is a
challenging task. In this paper, a novel structural feature is
exploited as a complement together with the spatial and
temporal information to improve the performance of
interaction prediction. The proposed structural feature is
captured by Long Short Term Memory (LSTM) networks,
which process the global and local features associated to
each frame and each optical flow image. A new ranking
score fusion method is then introduced to combine the
spatial, temporal and structural models. Experimental
results demonstrate that the proposed method
outperforms state-of-the-art methods for human
interaction prediction on the BIT-Interaction, the TV
Human Interaction and the UT-Interaction datasets.
X. Shu, J. Tang, G.-J. Qi, et al. [7] proposed that the recently,
Long Short-Term Memory (LSTM) has become a popular
choice to model individual dynamics for single-person
action recognition due to its ability of modeling the
temporal information in various ranges of dynamic
contexts. However, existing RNN models only focus on
capturing the temporal dynamics of the person-person
interactions by naively combining the activity dynamics of
individuals or modeling them as a whole. This neglects the
inter-related dynamics of how person-person interactions
change over time. To this end, we propose a novel
Concurrence-Aware Long Short-Term Sub-Memories (Co-
LSTSM) to model the long-term inter-related dynamics
between two interacting people on the bounding boxes
covering people. Specifically, for each frame, two sub-
memory units store individual motion information, while a
concurrent LSTM unit selectively integrates and stores
inter-related motion information between interacting
people from these two sub-memory units via a new co-
memory cell. Experimental results on the BIT and UT
datasets show the superiority of Co-LSTSM compared with
the state-of-the-art methods.
III. PROPOSED SYSTEM ARCHITECTURE:
1. Methodology:
2.
In a proposed system, we are proposing
experiment on human activity classification.
Fig. Proposed Architecture
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 04 | Apr 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 425
Human activity recognition has been a very active research
topic in the past two decades for its applications in various
fields such as health, remote monitoring, gaming, security
and surveillance, and human-computer interaction.
Activity recognition can be defined as the ability to
recognize/detect current activity on the basis of
information received from different sensors. These
sensors can be cameras, wearable sensors, or sensors
attached to objects of the daily use or deployed in the
environment. With the advancements in technology and
the reduction in device costs, the logging of daily activities
has become very popular and practical. People are logging
their daily life activities, such as cooking, eating, sleeping,
or watching TV. To capture these activities, different
approaches have been used.
In this work we present a neural network model
which combines convolutional neural networks and
background Subtraction. We first evaluate the effect of the
convolutional network used for understanding static
frames on action recognition. Our method uses the
technique known as CNN to automatically detect the
actions an in order to maximize its activity recognition
accuracy.
Proposed system steps mention below:
Step 1: Data Acquisition and Preprocessing
 Removing missing values.
 Handling categorical variable
 Feature Reduction
Step 2: Video Action Dataset
 Contains the different videos of human
 Related with the human activity.
Step 3: Background Subtraction
 After getting data from dataset extracting frames
or images from the videos.
 Generally two types of standards
 One model contains 26frames for 1 second video
 Second model contains 30 frames for one second
video
Step 4: Training and Testing Set (CNN)
 Splitting dataset into training and testing set
 fit testing set with dependent and independent
variable
 Backward elimination.
Step 5: Model testing
 Apply testing dataset to predictive model.
 Evaluate model comparing predicted value on
exact value.
2. Algorithms
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (which are additionally
called CNN/ConvNets) are a kind of Artificial Neural
Networks that are known to be tremendously strong in the
field of distinguishing proof just as picture order.
Four main operations in the Convolutional Neural
Networks are shown as follows:
Fig. CNN Architecture
(i) Convolution
The principle utilization of the Convolution activity if there
should be an occurrence of a CNN is to recognize fitting
highlights from the picture which goes about as a
contribution to the primary layer. Convolution keeps up
the spatial interrelation of the pixels this is finished by
fulfillment of picture highlights utilizing miniscule squares
of the picture. Convolution equation. Every picture is seen
as a network of pixels, each having its own worth. Pixel is
the littlest unit in this picture grid. Allow us to take a 5 by
5(5*5) framework whose qualities are just in twofold (for
example 0 or 1), for better agreement. It is to be noticed
that pictures are by and large RGB with upsides of the
pixels going from 0 - 255 i.e 256 pixels.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 04 | Apr 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 426
(ii) ReLU
ReLU follows up on a rudimentary level. All in all, it is an
activity which is applied per pixel and overrides every one
of the non-positive upsides of every pixel in the
component map by nothing.
(iii) Pooling or sub-sampling
Spatial Pooling which is likewise called subsampling or
down sampling helps in lessening the elements of each
element map yet even at the same time, holds the most
important data of the guide. Subsequent to pooling is done,
in the long run our 3D element map is changed over to one
dimensional component vector.
(iv) Fully Connected layer
The yield from the convolution and pooling activities gives
noticeable highlights which are removed from the picture.
These highlights are then used by Fully Connected layer
for consigning the info picture into various classes
predicated on the preparation dataset.
IV. CONCLUSION:
In this work, a wide range of strategies needs to
investigate in machine Learning and artificial intelligence
tailored for human behavior. In this report, we are
proposed a new model based on CNN for human activity
identification. First, extract the most important features
and use a linear combination of these features to identify
important feature vector. In this step, we choose a new
feature extraction method based on the characteristics of
news video using neural network. With this system we
provide a user-friendly application that covers aspects like
name of that newspaper. Using the challenging database in
which the newspapers are taken. At the front end, i.e. the
User Interface, the input will be the newspaper by the user.
We observe that the image representation from CNN
significantly outperforms Traditional Method. This shows
that a good representation of static images is essential for
good video classification. From related works we can
conclude that CNN is a good motion feature and captures
temporal information that enables action recognition.
REFERENCES:
[1] Y. Kong, Y. Jia, and Y. Fu, “Interactive phrases: Semantic
descriptions for human interaction recognition,”
[2] .-S. Zheng, and J. Zhang, “Learning person–person
interaction in collective activity recognition,”
[3] Y. Kong, Y. Jia, and Y. Fu, “Learning human interaction
by interactive phrases,”
[4] Y. Zhang, X. Liu, M. Chang, W. Ge, and T. Chen, “Spatio-
temporal phrases for activity recognition,” in ECCV, 2012.
[5] Donahue, L. Anne Hendricks, S. Guadarrama, M.
Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, “Long-
term recurrent convolutional networks for visual
recognition and description,”
[6] B. Clarkson, A. Pentland, and K. Mase, “Recognizing
user context via wearable sensors,” in Int’l Symp. on
Wearable Computers (ISWC 2000), 2000, pp. 69–75.
[7] T. Maekawa, Y. Yanagisawa, Y. Kishino, K. Ishiguro, K.
Kamei, Y. Sakurai, and T. Okadome, “Object based activity
recognition with heterogeneous sensors on wrist,” in
Pervasive 2010, 2010, pp. 246–264.
[8] H. Pirsiavash and D. Ramanan, “Detecting activities of
daily living in first-person camera views,” in CVPR 2012,
2012, pp. 2847–2854.
[9] M. Ma, H. Fan, and K. M. Kitani, “Going deeper into first-
person activity Recognition,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition,
2016, pp. 1894–1903.
[10] S. Singh, C. Arora, and C. Jawahar, “First person action
recognition using deep learned descriptors,” in The IEEE
Conference on Computer Vision and Pattern Recognition
(CVPR), 2016, pp. 2620–2628.
[11] M. Vieira, D. R. Faria, and U. Nunes, “Real-time
application for monitoring human daily activity and risk
situations in robot-assisted living,” in Robot 2015: Second
Iberian Robotics Conference. Springer, 2016, pp. 449–461.

More Related Content

Similar to Temporal Reasoning Graph for Activity Recognition (20)

PPTX
Human activity recognition updated 1 - Copy.pptx
BhaveshKhuje
 
PDF
A contextual bandit algorithm for mobile context-aware recommender system
Bouneffouf Djallel
 
PDF
Exploration exploitation trade off in mobile context-aware recommender systems
Bouneffouf Djallel
 
PDF
IRJET- Survey on Detection of Crime
IRJET Journal
 
PDF
Crowd Recognition System Based on Optical Flow Along with SVM classifier
IJECEIAES
 
PDF
50120140502009
IAEME Publication
 
PDF
Novel Approach to Use HU Moments with Image Processing Techniques for Real Ti...
CSCJournals
 
PDF
An ontology for semantic modelling of virtual world
ijaia
 
PDF
A Review on Human Activity Recognition System
vivatechijri
 
PDF
Ontology-Based Resource Interoperability in Socio-Cyber-Physical Systems
ITIIIndustries
 
PDF
IRJET- Recurrent Neural Network for Human Action Recognition using Star S...
IRJET Journal
 
PDF
Ponziani_Tinaburri_FEMTC2024 fire engineering model
angelovenuti1
 
PDF
A Review on Human Activity Recognition System
vivatechijri
 
PDF
ACTIVITY RECOGNITION USING HISTOGRAM OF ORIENTED GRADIENT PATTERN HISTORY
ijcseit
 
PDF
ACTIVITY RECOGNITION USING HISTOGRAM OF ORIENTED GRADIENT PATTERN HISTORY
ijcseit
 
PDF
Ergonomics-for-One in a Robotic Shopping Cart for the Blind
Vladimir Kulyukin
 
PDF
Hybrid-e-greedy for mobile context-aware recommender system
Bouneffouf Djallel
 
PDF
A Hybrid Architecture for Tracking People in Real-Time Using a Video Surveill...
sipij
 
PDF
Continuous hand gesture segmentation and acknowledgement of hand gesture path...
International Journal of Reconfigurable and Embedded Systems
 
PDF
Natural User Interfaces as a powerful tool for courseware design in Physical ...
Alejandra Delgado
 
Human activity recognition updated 1 - Copy.pptx
BhaveshKhuje
 
A contextual bandit algorithm for mobile context-aware recommender system
Bouneffouf Djallel
 
Exploration exploitation trade off in mobile context-aware recommender systems
Bouneffouf Djallel
 
IRJET- Survey on Detection of Crime
IRJET Journal
 
Crowd Recognition System Based on Optical Flow Along with SVM classifier
IJECEIAES
 
50120140502009
IAEME Publication
 
Novel Approach to Use HU Moments with Image Processing Techniques for Real Ti...
CSCJournals
 
An ontology for semantic modelling of virtual world
ijaia
 
A Review on Human Activity Recognition System
vivatechijri
 
Ontology-Based Resource Interoperability in Socio-Cyber-Physical Systems
ITIIIndustries
 
IRJET- Recurrent Neural Network for Human Action Recognition using Star S...
IRJET Journal
 
Ponziani_Tinaburri_FEMTC2024 fire engineering model
angelovenuti1
 
A Review on Human Activity Recognition System
vivatechijri
 
ACTIVITY RECOGNITION USING HISTOGRAM OF ORIENTED GRADIENT PATTERN HISTORY
ijcseit
 
ACTIVITY RECOGNITION USING HISTOGRAM OF ORIENTED GRADIENT PATTERN HISTORY
ijcseit
 
Ergonomics-for-One in a Robotic Shopping Cart for the Blind
Vladimir Kulyukin
 
Hybrid-e-greedy for mobile context-aware recommender system
Bouneffouf Djallel
 
A Hybrid Architecture for Tracking People in Real-Time Using a Video Surveill...
sipij
 
Continuous hand gesture segmentation and acknowledgement of hand gesture path...
International Journal of Reconfigurable and Embedded Systems
 
Natural User Interfaces as a powerful tool for courseware design in Physical ...
Alejandra Delgado
 

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
PDF
Kiona – A Smart Society Automation Project
IRJET Journal
 
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
PDF
Breast Cancer Detection using Computer Vision
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 

Recently uploaded (20)

PPTX
Online Cab Booking and Management System.pptx
diptipaneri80
 
PPTX
Ground improvement techniques-DEWATERING
DivakarSai4
 
PPTX
Water resources Engineering GIS KRT.pptx
Krunal Thanki
 
PDF
SG1-ALM-MS-EL-30-0008 (00) MS - Isolators and disconnecting switches.pdf
djiceramil
 
PPTX
ENSA_Module_7.pptx_wide_area_network_concepts
RanaMukherjee24
 
PDF
Construction of a Thermal Vacuum Chamber for Environment Test of Triple CubeS...
2208441
 
PDF
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
PDF
STUDY OF NOVEL CHANNEL MATERIALS USING III-V COMPOUNDS WITH VARIOUS GATE DIEL...
ijoejnl
 
PDF
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
PPTX
Information Retrieval and Extraction - Module 7
premSankar19
 
DOCX
SAR - EEEfdfdsdasdsdasdasdasdasdasdasdasda.docx
Kanimozhi676285
 
PDF
勉強会資料_An Image is Worth More Than 16x16 Patches
NABLAS株式会社
 
PPTX
Precedence and Associativity in C prog. language
Mahendra Dheer
 
PDF
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
PPTX
filteration _ pre.pptx 11111110001.pptx
awasthivaibhav825
 
PPTX
cybersecurityandthe importance of the that
JayachanduHNJc
 
PPTX
quantum computing transition from classical mechanics.pptx
gvlbcy
 
PPTX
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
PPTX
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
PPTX
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 
Online Cab Booking and Management System.pptx
diptipaneri80
 
Ground improvement techniques-DEWATERING
DivakarSai4
 
Water resources Engineering GIS KRT.pptx
Krunal Thanki
 
SG1-ALM-MS-EL-30-0008 (00) MS - Isolators and disconnecting switches.pdf
djiceramil
 
ENSA_Module_7.pptx_wide_area_network_concepts
RanaMukherjee24
 
Construction of a Thermal Vacuum Chamber for Environment Test of Triple CubeS...
2208441
 
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
STUDY OF NOVEL CHANNEL MATERIALS USING III-V COMPOUNDS WITH VARIOUS GATE DIEL...
ijoejnl
 
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
Information Retrieval and Extraction - Module 7
premSankar19
 
SAR - EEEfdfdsdasdsdasdasdasdasdasdasdasda.docx
Kanimozhi676285
 
勉強会資料_An Image is Worth More Than 16x16 Patches
NABLAS株式会社
 
Precedence and Associativity in C prog. language
Mahendra Dheer
 
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
filteration _ pre.pptx 11111110001.pptx
awasthivaibhav825
 
cybersecurityandthe importance of the that
JayachanduHNJc
 
quantum computing transition from classical mechanics.pptx
gvlbcy
 
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 

Temporal Reasoning Graph for Activity Recognition

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 04 | Apr 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 422 Temporal Reasoning Graph for Activity Recognition ABSTRACT: In this work, we expect to resolve the issue of human cooperation acknowledgment in recordings by investigating the long haul between related elements among various people. As of late, Long Short-Term Memory (LSTM) has gotten a well-known decision to display singular dynamic for single-individual activity acknowledgment because of its capacity to catch the transient movement data in a reach. In any case, most existing LSTM-based techniques center just on catching the elements of human collaboration by basically consolidating all elements of people or demonstrating them in general. Such strategies disregard the between related elements of how human cooperation’s change after some time. To this end, we propose a novel various leveled Long Short-Term Concurrent Memory (H- LSTCM) to model the drawn out between related elements among a gathering of people for perceiving human connections. In particular, we first feed every individual's .Static highlights into a Single-Person LSTM to show the single-individual dynamic. Consequently, at one time step, the yields of all Single-Person LSTM units are taken care of into a novel Concurrent LSTM (Co-LSTM) unit, which predominantly comprises of numerous sub-memory units, another cell door, furthermore, another co-memory cell. In the Co-LSTM unit, each sub-memory unit stores singular movement data, while this Co-LSTM unit specifically coordinates and stores between related movement data between different communicating people from various sub- memory units by means of the cell door and co-memory cell, individually. Broad investigations on a few public datasets approve the viability of the proposed H-LSTCM by contrasting against standard and cutting edge strategies. KEYWORDS: Convolutional neural networks (CNNs), LSTM, Human activity I. INTRODUCTION 1.1 Background HUMAN communications (e.g. hand-shaking, and talking) are run of the mill human exercises that happen out in the open places and are pulling in significant consideration from specialists. A human collaboration generally includes in any event two individual elements from various people, who are simultaneously between related with one another (e.g., a few people are talking together, a few people are handshaking with one another). Much of the time of human communication, the simultaneous interrelated elements between various people are unequivocally connecting (e.g., individual A kicks individual B, while individual B withdraws back). It has been shown that the simultaneous between related elements among various people, instead of single-individual elements, can contribute discriminative data for perceiving human communications. Having described the available data and the possible difficulties, we can now renew the questions to investigate in this project. First and foremost, we are interested in seeing if we can predict the criminal incidents, perhaps for a specific type of crime, for a small time frame and geographic region. Second, we are interested in learning which features have the most predictive power with respect to crime. Having an understanding of driving factors, cities can better work to mitigate the risk factors for crime. In human associations, exercises have a hidden purpose. This reason can be to achieve an objective, or to respond to some improvement. Both of these boundaries are governed by the climate of the people, which directs the logical components in the scene. Since this environment is shared by all people present in the scene, it is frequently the situation that the activities of people are interdependent and some coherency between these activities may exist. We call such exercises "aggregate". Instances of aggregate exercises are: Crossing the street, Talking, Waiting, Queuing, Walking, Dancing and Jogging. 1.2 Motivation  In human associations, exercises have a hidden purpose. This reason can be to achieve an objective, or to respond to some improvement.  Both of these boundaries are governed by the climate of the people, which directs the logical components in the scene. Sonali Chandole, Sajay Pingat Department of Computer Engineering, Smt. Kashibai Navale College of Engineering, Pune, India ----------------------------------------------------------------------------***---------------------------------------------------------------------
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 04 | Apr 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 423  Activity recognition in videos has attracted increasing interest recently. An activity can be defined as a certain spatial and temporal pattern involving the movements of a single or multiple actors. 1.3 Objectives  To develop a system that recommends an appropriate human activity based to its users based on the lengthy videos.  It fulfills the human activity based its video of the user by taking various inputs and generating the menu accordingly. 1] To collect Datasets of activity videos. 2] Implementation of CNN algorithm 3] Validation of Proposed Algorithm. II. LITERATURE SURVEY: The paper is written by. Kong, Y. Jia et al. [1] to comprehend human to human managing precisely, human collaboration acknowledgment (HIR) frameworks require strong component extraction and choice strategies dependent on vision sensors. In this paper, we have proposed WHITE STAG model to astutely follow human cooperation’s utilizing space time strategies just as shape based precise mathematical consecutive methodologies over full-body outlines and skeleton joints, separately. After highlight extraction, include space is diminished by utilizing codebook age and direct discriminant investigation (LDA). At long last, part sliding perceptron is utilized to perceive various classes of human connections. The proposed WHITE STAG model is approved utilizing two openly accessible RGB datasets and one self-clarified power intuitive dataset as oddity. For assessment, four examinations are performed utilizing forget about one and cross approval testing plans. Our WHITE STAG model and bit sliding perceptron beat the current notable measurable best in class strategies by accomplishing a weighted normal acknowledgment pace of 87.48% more than UT- Interaction, 87.5% more than BIT-Interaction and 85.7% over proposed IM-IntensityInteractive7 datasets. The proposed framework ought to be pertinent to different sight and sound substance and security applications like reconnaissance frameworks, video based learning, clinical futurists, administration robots, and intelligent gaming. X. Chang, W.-S. Zheng et al. [2] stated that present a framework for the recognition of collective human activities. A collective activity is defined or reinforced by the existence of coherent behavior of individuals in time and space. We call such coherent behavior ‘Crowd Context’. Examples of collective activities are “queuing in a line” or “talking”. Following [7], we propose to recognize collective activities using the crowd context and introduce a new scheme for learning it automatically. Our scheme is constructed upon a Random Forest structure which randomly samples variable volume spatiotemporal regions to pick the most discriminating at-tributes for classification. Unlike previous approaches, our algorithm automatically finds the optimal configuration of spatiotemporal bins, over which to sample the evidence, by randomization. This enables a methodology for modelling crowd context. We employ a 3D Markov Random Field to regularize the classification and localize collective activities in the scene. We demonstrate the flexibility and scalability of the proposed framework in a number of experiments and show that our method outperforms state-of-the art action classification techniques. Y. Kong and Y. Fu et al. [3] proposed that the Author Index contains the primary entry for each item, listed under the first author’s name. The primary entry includes the co- authors’ names, the title of the paper or other item, and its location, specified by the publication abbreviation, year, month, and inclusive pagination. The Subject Index contains entries describing the item under all appropriate subject headings, plus the first author’s name, the publication abbreviation, month, and year, and inclusive pages. Note that the item title is found only under the primary entry in the Author Index. Y. Zhang, X. Liu et al. [4] stated that the local feature based approaches have become popular for activity recognition. A local feature captures the local movement and appearance of a local region in a video, and thus can be ambiguous; e.g., it cannot tell whether a movement is from a person’s hand or foot, when the camera is far away from the person. To better distinguish different types of activities, people have proposed using the combination of local features to encode the relationships of local movements. Due to the computation limit, previous work only creates a combination from neighboring features in space and/or time. In this paper, we propose an approach that efficiently identifies both local and long-range motion interactions; taking the “push” activity as an example, our approach can capture the combination of the hand movement of one person and the foot response of another person, the local features of which are both spatially and temporally far away from each other. Our computational complexity is
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 04 | Apr 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 424 in linear time to the number of local features in a video. The extensive experiments show that our approach is generically effective for recognizing a wide variety of activities and activities spanning a long term, compared to a number of state-of-the-art methods. J. Donahue, L. Anne Hendricks et al. [5] proposed that the models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large- scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are "doubly deep"' in that they can be compositional in spatial and temporal "layers". Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with back propagation. Our recurrent long-term models are directly connected to modern visual convent models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized. Q. Ke, M. Bennamoun, S. An, et al. [6] stated that the human interaction prediction, i.e., the recognition of an ongoing interaction activity before it is completely executed, has a wide range of applications such as human-robot interaction and the prevention of dangerous events. Due to the large variations in appearance and the evolution of scenes, interaction prediction at an early stage is a challenging task. In this paper, a novel structural feature is exploited as a complement together with the spatial and temporal information to improve the performance of interaction prediction. The proposed structural feature is captured by Long Short Term Memory (LSTM) networks, which process the global and local features associated to each frame and each optical flow image. A new ranking score fusion method is then introduced to combine the spatial, temporal and structural models. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods for human interaction prediction on the BIT-Interaction, the TV Human Interaction and the UT-Interaction datasets. X. Shu, J. Tang, G.-J. Qi, et al. [7] proposed that the recently, Long Short-Term Memory (LSTM) has become a popular choice to model individual dynamics for single-person action recognition due to its ability of modeling the temporal information in various ranges of dynamic contexts. However, existing RNN models only focus on capturing the temporal dynamics of the person-person interactions by naively combining the activity dynamics of individuals or modeling them as a whole. This neglects the inter-related dynamics of how person-person interactions change over time. To this end, we propose a novel Concurrence-Aware Long Short-Term Sub-Memories (Co- LSTSM) to model the long-term inter-related dynamics between two interacting people on the bounding boxes covering people. Specifically, for each frame, two sub- memory units store individual motion information, while a concurrent LSTM unit selectively integrates and stores inter-related motion information between interacting people from these two sub-memory units via a new co- memory cell. Experimental results on the BIT and UT datasets show the superiority of Co-LSTSM compared with the state-of-the-art methods. III. PROPOSED SYSTEM ARCHITECTURE: 1. Methodology: 2. In a proposed system, we are proposing experiment on human activity classification. Fig. Proposed Architecture
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 04 | Apr 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 425 Human activity recognition has been a very active research topic in the past two decades for its applications in various fields such as health, remote monitoring, gaming, security and surveillance, and human-computer interaction. Activity recognition can be defined as the ability to recognize/detect current activity on the basis of information received from different sensors. These sensors can be cameras, wearable sensors, or sensors attached to objects of the daily use or deployed in the environment. With the advancements in technology and the reduction in device costs, the logging of daily activities has become very popular and practical. People are logging their daily life activities, such as cooking, eating, sleeping, or watching TV. To capture these activities, different approaches have been used. In this work we present a neural network model which combines convolutional neural networks and background Subtraction. We first evaluate the effect of the convolutional network used for understanding static frames on action recognition. Our method uses the technique known as CNN to automatically detect the actions an in order to maximize its activity recognition accuracy. Proposed system steps mention below: Step 1: Data Acquisition and Preprocessing  Removing missing values.  Handling categorical variable  Feature Reduction Step 2: Video Action Dataset  Contains the different videos of human  Related with the human activity. Step 3: Background Subtraction  After getting data from dataset extracting frames or images from the videos.  Generally two types of standards  One model contains 26frames for 1 second video  Second model contains 30 frames for one second video Step 4: Training and Testing Set (CNN)  Splitting dataset into training and testing set  fit testing set with dependent and independent variable  Backward elimination. Step 5: Model testing  Apply testing dataset to predictive model.  Evaluate model comparing predicted value on exact value. 2. Algorithms Convolutional Neural Networks (CNN) Convolutional Neural Networks (which are additionally called CNN/ConvNets) are a kind of Artificial Neural Networks that are known to be tremendously strong in the field of distinguishing proof just as picture order. Four main operations in the Convolutional Neural Networks are shown as follows: Fig. CNN Architecture (i) Convolution The principle utilization of the Convolution activity if there should be an occurrence of a CNN is to recognize fitting highlights from the picture which goes about as a contribution to the primary layer. Convolution keeps up the spatial interrelation of the pixels this is finished by fulfillment of picture highlights utilizing miniscule squares of the picture. Convolution equation. Every picture is seen as a network of pixels, each having its own worth. Pixel is the littlest unit in this picture grid. Allow us to take a 5 by 5(5*5) framework whose qualities are just in twofold (for example 0 or 1), for better agreement. It is to be noticed that pictures are by and large RGB with upsides of the pixels going from 0 - 255 i.e 256 pixels.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 04 | Apr 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 426 (ii) ReLU ReLU follows up on a rudimentary level. All in all, it is an activity which is applied per pixel and overrides every one of the non-positive upsides of every pixel in the component map by nothing. (iii) Pooling or sub-sampling Spatial Pooling which is likewise called subsampling or down sampling helps in lessening the elements of each element map yet even at the same time, holds the most important data of the guide. Subsequent to pooling is done, in the long run our 3D element map is changed over to one dimensional component vector. (iv) Fully Connected layer The yield from the convolution and pooling activities gives noticeable highlights which are removed from the picture. These highlights are then used by Fully Connected layer for consigning the info picture into various classes predicated on the preparation dataset. IV. CONCLUSION: In this work, a wide range of strategies needs to investigate in machine Learning and artificial intelligence tailored for human behavior. In this report, we are proposed a new model based on CNN for human activity identification. First, extract the most important features and use a linear combination of these features to identify important feature vector. In this step, we choose a new feature extraction method based on the characteristics of news video using neural network. With this system we provide a user-friendly application that covers aspects like name of that newspaper. Using the challenging database in which the newspapers are taken. At the front end, i.e. the User Interface, the input will be the newspaper by the user. We observe that the image representation from CNN significantly outperforms Traditional Method. This shows that a good representation of static images is essential for good video classification. From related works we can conclude that CNN is a good motion feature and captures temporal information that enables action recognition. REFERENCES: [1] Y. Kong, Y. Jia, and Y. Fu, “Interactive phrases: Semantic descriptions for human interaction recognition,” [2] .-S. Zheng, and J. Zhang, “Learning person–person interaction in collective activity recognition,” [3] Y. Kong, Y. Jia, and Y. Fu, “Learning human interaction by interactive phrases,” [4] Y. Zhang, X. Liu, M. Chang, W. Ge, and T. Chen, “Spatio- temporal phrases for activity recognition,” in ECCV, 2012. [5] Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, “Long- term recurrent convolutional networks for visual recognition and description,” [6] B. Clarkson, A. Pentland, and K. Mase, “Recognizing user context via wearable sensors,” in Int’l Symp. on Wearable Computers (ISWC 2000), 2000, pp. 69–75. [7] T. Maekawa, Y. Yanagisawa, Y. Kishino, K. Ishiguro, K. Kamei, Y. Sakurai, and T. Okadome, “Object based activity recognition with heterogeneous sensors on wrist,” in Pervasive 2010, 2010, pp. 246–264. [8] H. Pirsiavash and D. Ramanan, “Detecting activities of daily living in first-person camera views,” in CVPR 2012, 2012, pp. 2847–2854. [9] M. Ma, H. Fan, and K. M. Kitani, “Going deeper into first- person activity Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1894–1903. [10] S. Singh, C. Arora, and C. Jawahar, “First person action recognition using deep learned descriptors,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2620–2628. [11] M. Vieira, D. R. Faria, and U. Nunes, “Real-time application for monitoring human daily activity and risk situations in robot-assisted living,” in Robot 2015: Second Iberian Robotics Conference. Springer, 2016, pp. 449–461.