0% found this document useful (0 votes)
27 views49 pages

Disease Prediction of Adiposity Using ML

The document presents a minor project report titled 'Adaptive Health Monitoring and Disease Prediction' submitted by students for their Bachelor of Technology in Computer Science & Engineering. It outlines the development of a web-based platform that utilizes machine learning algorithms to predict health risks for diseases like Parkinson's, diabetes, and adiposity based on clinical indicators. The project emphasizes accessibility, preventive care, and user empowerment through real-time health assessments and personalized recommendations.

Uploaded by

sivar0337
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views49 pages

Disease Prediction of Adiposity Using ML

The document presents a minor project report titled 'Adaptive Health Monitoring and Disease Prediction' submitted by students for their Bachelor of Technology in Computer Science & Engineering. It outlines the development of a web-based platform that utilizes machine learning algorithms to predict health risks for diseases like Parkinson's, diabetes, and adiposity based on clinical indicators. The project emphasizes accessibility, preventive care, and user empowerment through real-time health assessments and personalized recommendations.

Uploaded by

sivar0337
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

BATCH N O :MI2207

ADAPTIVE HEALTH MONITORING AND DISEASE


PREDICTION

Minor project-I report submitted


in partial fulfillment of the requirement for award of the degree of

Bachelor of Technology
in
Computer Science & Engineering

By

G V V SATYANARAYANA (22UECM0086) (VTU 21584)


Y VENKATA SUDHEER (22UECM0292) (VTU 21775)
M SAI TEJA (22UECM0153) (VTU 22390)

Under the guidance of


Dr. R.LOTUS, M.Tech., Ph. D .,
ASSOCIATE PROFESSOR

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


SCHOOL OF COMPUTING

VEL TECH RANGARAJAN DR. SAGUNTHALA R&D INSTITUTE OF


SCIENCE AND TECHNOLOGY
(Deemed to be University Estd u/s 3 of UGC Act, 1956)
Accredited by NAAC with A++ Grade
CHENNAI 600 062, TAMILNADU, INDIA

May, 2025
BATCH N O :MI1001
ADAPTIVE HEALTH MONITORING AND DISEASE
PREDICTION

Minor project-I report submitted


in partial fulfillment of the requirement for award of the degree of

Bachelor of Technology
in
Computer Science & Engineering

By

G V V SATYANARAYANA (22UECM0086) (VTU 21584)


Y VENKATA SUDHEER (22UECM0292) (VTU 21775)
M SAI TEJA (22UECM0153) (VTU 22390)

Under the guidance of


Dr. R.LOTUS, M.Tech., Ph. D.,
ASSOCIATE PROFESSOR

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


SCHOOL OF COMPUTING

VEL TECH RANGARAJAN DR. SAGUNTHALA R&D INSTITUTE OF


SCIENCE AND TECHNOLOGY
(Deemed to be University Estd u/s 3 of UGC Act, 1956)
Accredited by NAAC with A++ Grade
CHENNAI 600 062, TAMILNADU, INDIA

May, 2025
CERTIFICATE
It is certified that the work contained in the project report titled ”ADAPTIVE HEALTH MONI-
TORING AND DISEASE PREDICTION ” by ”G V V SATYANARAYANA (22UECM0086), Y
VENKATA SUDHEER (22UECM0292), M SAI TEJA (22UECM0153)” has been carried out under
my supervision and that this work has not been submitted elsewhere for a degree.

Signature of Supervisor
Dr. R.LOTUS
Associate Professor
Computer Science & Engineering
School of Computing
Vel Tech Rangarajan Dr. Sagunthala R&D
Institute of Science and Technology
May, 2025

Signature of Head/Assistant Head of the Department Signature of the Dean


Dr. N. Vijayaraj/Dr. M. S. Murali dhar Dr. S P. Chokkalingam
Professor & Head/ Assoc. Professor &Assistant Head Professor & Dean
Computer Science & Engineering
School of Computing School of Computing
Vel Tech Rangarajan Dr. Sagunthala R&D Vel Tech Rangarajan Dr. Sagunthala R&D
Institute of Science and Technology Institute of Science and Technology
May, 2025 May, 2025

i
DECLARATION

We declare that this written submission represents our ideas in our own words and where others’
ideas or words have been included, we have adequately cited and referenced the original sources. We
also declare that we have adhered to all principles of academic honesty and integrity and have not
misrepresented or fabricated or falsified any idea/data/fact/source in our submission. We understand
that any violation of the above will be cause for disciplinary action by the Institute and can also
evoke penal action from the sources which have thus not been properly cited or from whom proper
permission has not been taken when needed.

(Signature)
G V V SATYANARAYANA
Date: / /

(Signature)
Y VENKATA SUDHEER
Date: / /

(Signature)
(M SAI TEJA
Date: / /

ii
APPROVAL SHEET

This project report entitled ADAPTIVE HEALTH MONITORING AND DISEASE PREDICTION
by G V V SATYANARAYANA (22UECM0086), Y VENKATA SUDHEER (22UECM0292), M SAI
TEJA (22UECM0153) is approved for the degree of B.Tech in Computer Science & Engineering.

Examiners Supervisor

Dr. R.LOTUS, M.Tech., Ph. D.,

Date: / /
Place:

iii
ACKNOWLEDGEMENT

We express our deepest gratitude to our Honorable Founder Chancellor and President Col.
Prof. Dr. R. RANGARAJAN B.E. (Electrical), B.E. (Mechanical), M.S (Automobile), D.Sc., and
Foundress President Dr. R. SAGUNTHALA RANGARAJAN M.B.B.S. Vel Tech Rangarajan Dr.
Sagunthala R&D Institute of Science and Technology, for her blessings.

We express our sincere thanks to our respected Chairperson and Managing Trustee Mrs. RAN-
GARAJAN MAHALAKSHMI KISHORE,B.E., Vel Tech Rangarajan Dr. Sagunthala R&D
Institute of Science and Technology, for her blessings.

We are very much grateful to our beloved Vice Chancellor Prof. Dr.RAJAT GUPTA, for provid-
ing us with an environment to complete our project successfully.

We record indebtedness to our Professor & Dean, School of Computing, Dr. S P.


CHOKKALINGAM, M.Tech., Ph.D., & Associate Dean,School of Computing, Dr. V. DHILIP
KUMAR,M.E.,Ph.D., for immense care and encouragement towards us throughout the course of this
project.

We are thankful to our Professor & Head, Department of Computer Science & Engineering,
Dr. N. VIJAYARAJ, M.E., Ph.D., and Associate Professor & Assistant Head, Department of
Computer Science & Engineering, Dr. M. S. MURALI DHAR, M.E., Ph.D.,for providing im-
mense support in all our endeavors.

We also take this opportunity to express a deep sense of gratitude to our Internal Supervisor Dr.
R.LOTUS, M.Tech., Ph. D., for his/her cordial support, valuable information and guidance, he/she
helped us in completing this project through various stages.

A special thanks to our Project Coordinators Dr. SADISH SENDIL MURUGARAJ,Professor,


Dr.S.KARTHIYAYINI, M.E,Ph.D., Mr. V. ASHOK KUMAR, B.E,M.Tech., for their valuable
guidance and support throughout the course of the project.

We thank our department faculty, supporting staff and friends for their help and guidance to com-
plete this project.

G V V SATYANARAYANA (22UECM0086)
Y VENKATA SUDHEER (22UECM0292)
M SAI TEJA (22UECM0153)

iv
ABSTRACT

This project introduces an intelligent web-based platform for predicting Parkin-


son’s disease, diabetes, and adiposity using clinical indicators such as blood pressure,
calcium levels, blood sugar, and maximum heart rate. By analyzing these inputs, the
system enables users to assess their health risks without requiring direct medical
consultation. Each disease module is designed independently, allowing the interface
to deliver precise, disorder-specific evaluations. Employing machine learning algo-
rithms like Random Forest and Gradient Boosting, the application enhances accuracy
through advanced pattern recognition and ensemble learning techniques. The train-
ing data is sourced from verified medical datasets, ensuring reliable and evidence-
based outcomes. Unlike conventional diagnostics, this platform offers instantaneous
feedback and personalized suggestions. It operates efficiently on both mobile and
desktop devices, maximizing accessibility for users in rural or underserved areas.
Through adaptive learning, the models continuously refine their predictions as more
user data becomes available. Preventive care is promoted by offering actionable
recommendations tailored to the individual’s health profile. The intuitive interface
requires no technical expertise, making it suitable for a broad user base. With cost-
effective operation and scalability, the system supports deployment in community
health campaigns. Awareness and education are enhanced as users gain insights into
their conditions in real time. By simplifying early disease identification, the platform
reduces treatment delays and fosters better outcomes. Its integration of data analysis
with health management bridges crucial gaps in existing healthcare systems. Overall,
this tool empowers users, promotes proactive well-being, and makes critical health
intelligence easily accessible to all.

Keywords:
Random Forest, Gradient Boosting, Parkinson’s Disease, Diabetes, Adipos-
ity, Symptom Analyzer, Disease Prediction, Health Assessment, Machine Learn-
ing, Early Detection, Risk Estimation, Medical Interface, Preventive Health-
care, Accessibility, Health Intelligence.

v
LIST OF FIGURES

4.1 General Architecture . . . . . . . . . . . . . . . . . . . . . . . . 11


4.2 Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.5 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.6 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.1 Unit Testing Results . . . . . . . . . . . . . . . . . . . . . . . . . 23


5.2 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6.1 Output 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2 Output 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.3 Output 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.4 Output 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

vi
LIST OF TABLES

6.1 Model Performance Evaluation . . . . . . . . . . . . . . . . . . . . 25

vii
LIST OF ACRONYMS AND
ABBREVIATIONS

Abbreviation Full Form


AI Artificial Intelligence
BP Blood Pressure
CT Computed Tomography
DRA Disease Risk Assessment
DT Decision Tree
GB Gradient Boosting
HR Heart Rate
ML Machine Learning
MRI Magnetic Resonance Imaging
RF Random Forest
X-ray X-radiation

viii
TABLE OF CONTENTS

Page.No

ABSTRACT v

LIST OF FIGURES vi

LIST OF TABLES vii

LIST OF ACRONYMS AND ABBREVIATIONS viii

1 INTRODUCTION 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim of the project . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Project Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Scope of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 LITERATURE REVIEW 4
2.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Gap Identification . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 PROJECT DESCRIPTION 9
3.1 Existing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 System Specification . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3.1 Hardware Specification . . . . . . . . . . . . . . . . . . . . 10
3.3.2 Software Specification . . . . . . . . . . . . . . . . . . . . 10
3.3.3 Standards and Policies . . . . . . . . . . . . . . . . . . . . 10

4 METHODOLOGY 11
4.1 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 General Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Design Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.3.1 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . 12
4.3.2 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . 13
4.3.3 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3.4 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . 15
4.3.5 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . 16
4.4 Algorithm & Pseudo Code . . . . . . . . . . . . . . . . . . . . . . 17
4.4.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4.2 Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4.3 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.5 Module Description . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.5.1 Module 1: Diabetes Disease Prediction . . . . . . . . . . . 20
4.5.2 Module 2: Parkinson’s Disease Prediction . . . . . . . . . . 20
4.5.3 Module 3: Adiposity Risk Prediction . . . . . . . . . . . . 21
4.5.4 Module 4: Symptom Analyzer (Random Forest) . . . . . . 21

5 IMPLEMENTATION AND TESTING 22


5.1 Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.1.1 Input Design . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.1.2 Output Design . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2 Types of Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.1 Unit testing . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.2 System testing . . . . . . . . . . . . . . . . . . . . . . . . 23

6 RESULTS AND DISCUSSIONS 25


6.1 Efficiency of the Proposed System . . . . . . . . . . . . . . . . . . 25
6.2 Comparison of Existing and Proposed System . . . . . . . . . . . . 25

7 CONCLUSION AND FUTURE ENHANCEMENTS 31


7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
7.2 Future Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . 31

8 PLAGIARISM REPORT 33

9 Source Code 34

References 36
Chapter 1

INTRODUCTION

1.1 Introduction

The ML-Based Disease Prediction System is designed to predict the risks of


Parkinson’s disease, Adiposity, and Diabetes using Gradient Boosting and Random
Forest algorithms. Clinical data, such as blood pressure, blood sugar, and heart rate,
is collected from reliable medical sources and preprocessed to ensure consistency.
This data is divided into training and testing sets for robust model evaluation. Gra-
dient Boosting is employed to classify diseases by identifying complex patterns in
the data, while Random Forest enhances accuracy in symptom mapping for disease
prediction. These models are trained to optimize metrics like precision, recall, and
F1-score, ensuring high prediction reliability. A key feature of the system is the
Symptom Analyzer, which allows users to input symptoms for immediate analysis,
linking them to potential diseases based on machine learning-driven insights. The
combination of prediction and symptom-based analysis provides a comprehensive
health assessment tool, enabling early detection of health risks. This tool promotes
timely intervention, encourages preventive healthcare, and supports better decision-
making by individuals and healthcare providers.

1.2 Aim of the project

The aim of the ML-Based Disease Prediction and Symptom Analyzer project
is to develop an intelligent system that predicts the risk of Parkinson’s disease, Adi-
posity, and Diabetes using Gradient Boosting and Random Forest algorithms. By
leveraging clinical data such as blood pressure, blood sugar, calcium levels, and heart
rate, this system provides accurate health predictions to help individuals assess their
risk of developing these diseases. The goal is to make the technology accessible to
users without requiring medical expertise, enabling them to understand their health
status and take preventive actions early. The system collects data from reliable med-
ical sources, cleans, and normalizes it for consistency before dividing it into training

1
and testing datasets for effective model evaluation. The project aims to optimize the
models for high accuracy in prediction, focusing on metrics such as precision, recall,
and F1-score. Another key objective is to integrate a Symptom Analyzer module that
uses the Random Forest algorithm to map reported symptoms to possible diseases,
offering immediate insights for users. This feature enhances the system’s usability
by allowing individuals to self-assess and detect potential health risks. Ultimately,
the project aims to empower users with early detection tools, encourage preventive
healthcare measures, and provide timely medical insights to reduce risks associated
with chronic diseases

1.3 Project Domain

The project domain of the ML-Based Disease Prediction and Symptom Analyzer
lies at the intersection of Machine Learning, Healthcare, and Data Analytics. The
primary objective of this system is to harness the power of advanced machine learn-
ing algorithms, specifically Gradient Boosting and Random Forest, to predict the
risk of developing chronic diseases such as Parkinson’s disease, Adiposity, and Di-
abetes. By analyzing clinical data like blood pressure, blood sugar levels, calcium
concentrations, and heart rate, the system provides users with accurate and reliable
predictions based on data-driven insights. The domain encompasses the collection
and preprocessing of medical data from trusted sources, followed by its segmentation
into training and testing datasets to train and evaluate the machine learning models.
The Symptom Analyzer feature, powered by Random Forest, allows users to input
symptoms and receive immediate feedback on potential conditions. This integra-
tion of symptom analysis with predictive modeling ensures a comprehensive health
assessment, making it an invaluable tool for early disease detection and risk manage-
ment. The system aims to offer a user-friendly interface, enabling both individuals
and healthcare professionals to make informed decisions. The ultimate goal is to
provide accessible healthcare technology that bridges gaps in early diagnosis, pro-
motes preventive care, and empowers users to take proactive steps toward improving
their overall health.

2
1.4 Scope of the Project

The scope of the project focuses on developing a robust ML-Based Disease Pre-
diction System that targets the prediction of Parkinson’s disease, Adiposity, and Di-
abetes using Gradient Boosting and Random Forest algorithms. The system will
process and analyze clinical data such as blood pressure, blood sugar, calcium lev-
els, and heart rate to predict the likelihood of these conditions. The scope extends to
the design of a Symptom Analyzer tool that allows users to input their symptoms and
receive possible disease outcomes based on machine learning-driven insights. This
feature aims to enhance accessibility by offering immediate, data-driven feedback.
The system’s development will involve data collection from trusted medical sources,
preprocessing for consistency, and splitting the dataset into training and testing sets
for model evaluation. The machine learning models will be optimized to improve ac-
curacy and predictive performance based on precision, recall, and F1-score metrics.
The project also covers the integration of a user-friendly interface that allows indi-
viduals to assess their health risks independently. Additionally, the scope includes
the provision of early detection capabilities, empowering users to take preventive ac-
tions to manage their health proactively. The system is intended to bridge healthcare
gaps by offering affordable, efficient, and easily accessible tools for individuals to
monitor and manage chronic disease risks in a timely manner.

3
Chapter 2

LITERATURE REVIEW

2.1 Literature Review

[1] Sogandi (2024) This paper focuses on automating disease prediction using lan-
guage models, particularly MCN-BERT and BiLSTM models. The study applies
these models to two datasets: one for general disease prediction and another for iden-
tifying Adverse Drug Reactions (ADRs) from Twitter data. The MCN-BERT model
optimized with AdamP achieved high accuracy for both datasets. The findings indi-
cate that deep learning models can support earlier disease detection and better remote
diagnostics. This method shows promise in improving the speed and accuracy of dis-
ease predictions. The research suggests that future improvements could lead to more
efficient remote healthcare systems.
[2] Pilehvari et al. (2024) This analytical review explores the use of AI and
machine learning in diagnosing and predicting multiple sclerosis (MS). It evaluates
how these techniques can be used for identifying risk factors, predicting disease
progression, and optimizing treatment plans. The paper emphasizes the importance
of combining clinical data with machine learning algorithms to improve diagnosis
accuracy. It also outlines the current limitations and challenges in applying AI to
MS prediction. The authors suggest that further research should focus on refining
models and making them more accessible in clinical settings. The paper concludes
that AI has the potential to revolutionize MS diagnosis and treatment strategies.
[3] Pinto et al. (2020) In this study, the authors developed machine learning mod-
els to predict the progression of multiple sclerosis. The models incorporate both
clinical and demographic data to provide predictions about disease outcomes. These
models are designed to assist in developing personalized treatment strategies for MS
patients. The research highlights the potential of machine learning to identify early
signs of disease progression and improve patient management. The authors empha-
size the need for integrating diverse data sources to enhance prediction accuracy. The
study suggests that further advancements in machine learning could lead to better in-
dividualized care for MS patients.

4
[4] Hassan et al. (2024) This paper focuses on optimizing disease classification
by analyzing symptoms using language models. The authors propose an approach
that leverages deep learning models to better understand and categorize diseases
based on symptom data. The results show that using advanced language models
can improve the classification accuracy of various diseases. The study demonstrates
how machine learning can analyze complex symptom patterns that may be difficult
for traditional diagnostic methods. It also highlights the challenges of working with
unstructured symptom data. The paper suggests that further research is needed to
enhance the generalization of these models for clinical applications.
[5] Gurevich et al. (2025) This research presents a machine learning-based
approach to predict disease progression in primary progressive multiple sclerosis
(PPMS). The study uses data such as blood transcriptome information and MRI met-
rics to develop prognostic models. These models aim to predict disability progres-
sion and changes in brain volume. The results indicate that machine learning can
play a crucial role in understanding PPMS progression and developing personalized
treatment plans. The authors also discuss the potential of combining molecular and
clinical data to improve prediction accuracy. They highlight that machine learning
has the potential to support better decision-making for PPMS patients and healthcare
providers.
[6] Park et al. (2021) This paper focuses on developing a machine learning model
to predict diseases based on laboratory test results. The authors compare various ma-
chine learning techniques to identify the most accurate models for diagnosing differ-
ent diseases. The study shows that machine learning models can successfully predict
diseases using only laboratory data, which simplifies the diagnostic process. The
paper highlights the ability of machine learning to process complex patterns in clin-
ical data, potentially reducing diagnostic errors. It also emphasizes the importance
of selecting the right model to ensure high prediction accuracy. The findings sug-
gest that this approach could improve efficiency in healthcare by automating disease
diagnosis.
[7] Yousef et al. (2024) This review discusses how machine learning can predict
the progression and outcomes of multiple sclerosis (MS) using MRI-based biomark-
ers. The authors examine various ML models, their challenges, and their effective-
ness in integrating MRI data to predict MS progression. They discuss the role of
biomarkers in improving prediction accuracy and the potential of machine learning
to enhance diagnostic processes. The paper reviews existing studies and suggests

5
that the integration of MRI data into machine learning models could lead to more
personalized treatment plans for MS patients. The authors also propose directions
for future research to refine these models. They conclude that machine learning can
significantly improve the management of MS.
[8] Delpino et al. (2022) This systematic review evaluates how machine learning
can be used to predict chronic diseases. The authors examine a variety of machine
learning techniques and assess their effectiveness in predicting conditions such as
diabetes, cardiovascular diseases, and cancer. The paper highlights the promise of
machine learning in improving early detection and prevention strategies. It discusses
the challenges of working with large datasets and the need for high-quality data to
ensure accurate predictions. The authors suggest that while machine learning offers
significant potential, there are still challenges to overcome in making these models
clinically applicable. The paper calls for further research to improve the reliability
and generalization of these models.
[9] Sang et al. (2024) This study presents a machine learning model for predicting
neurodegenerative diseases in patients with type 2 diabetes. The authors use data
from two independent Korean cohorts to develop and validate predictive models.
The study demonstrates how machine learning can identify at-risk patients early and
suggest intervention strategies. The authors emphasize the importance of validating
these models across different populations to ensure their generalizability. The paper
highlights the potential of machine learning in improving the management of type 2
diabetes and preventing associated neurodegenerative diseases. The authors propose
further studies to refine these models for clinical use.
[10] Zhang et al. (2019) This paper introduces a disease prediction and early
intervention system based on symptom similarity analysis. The authors use a con-
volutional neural network (CNN) model to analyze patient symptoms and predict
potential diseases. The system aims to identify patterns in symptoms and suggest
early intervention measures to reduce the severity of diseases. The study highlights
the accuracy of the CNN model in predicting diseases based on symptom similarity,
showing its potential for early diagnosis. The paper discusses how such systems can
improve healthcare delivery by automating the diagnostic process. The authors sug-
gest that further development of this system could lead to more efficient healthcare
solutions.

6
2.2 Gap Identification

[1] Sogandi, F. (2024) While the study successfully uses deep learning models for
disease prediction, it does not extensively explore the long-term impact of these mod-
els in clinical settings. Additionally, the lack of a comparative analysis between
different disease types limits its broader application. Further research is needed to
evaluate real-world implementation and scalability.
[2] Pilehvari et al. (2024) The review focuses on AI’s potential in MS diagnosis,
yet it overlooks the integration of AI with personalized treatment plans. The models
discussed have limited real-world validation, and the challenges of data privacy and
model explainability are not adequately addressed. There is a need for more practical
case studies and longitudinal data.
[3] Pinto et al. (2020) Although the study uses machine learning for disease pro-
gression prediction in MS, it does not consider the variability in disease progression
among different patient populations. The models lack personalization, and the pre-
diction accuracy can be improved by integrating more diverse data sources. A focus
on multi-center clinical trials could enhance model robustness.
[4] Hassan et al. (2024) The study optimizes disease classification but does not
account for the complexity of unstructured symptom data in real-time settings. It
assumes that symptom data is complete and well-organized, which is often not the
case in clinical practice. Future work should address the challenges of working with
real-time, noisy data.
[5] Gurevich et al. (2025) While the study uses machine learning for predicting
PPMS progression, it is limited by a narrow scope of biomarkers. The models rely
heavily on a small dataset, which may not generalize well across diverse patient
populations. Broader validation with larger and more diverse cohorts is needed to
ensure model applicability.
[6] Park et al. (2021) The model developed in this paper is limited by the scope
of laboratory tests considered. It does not incorporate non-traditional diagnostic fac-
tors such as patient history or imaging data. Further development of hybrid models
combining various diagnostic data sources could enhance prediction accuracy.
[7] Yousef et al. (2024) The review of machine learning and MRI-based biomark-
ers in MS progression prediction does not consider the real-time feasibility of these
methods in clinical practice. The potential of integrating AI with emerging tech-
nologies like wearables is not explored. Future research should focus on overcoming

7
practical implementation barriers.
[8] Delpino et al. (2022) This systematic review on chronic disease prediction
fails to adequately address the issue of model generalization across different health-
care settings. While machine learning shows promise, the lack of standardized data
and the ethical concerns surrounding AI in healthcare are not sufficiently discussed.
More research is needed to address these challenges.
[9] Sang et al. (2024) While the study presents a model for predicting neurode-
generative diseases in type 2 diabetes patients, it does not examine how these models
can be integrated into routine healthcare practices. There is also a need to consider
the impact of external factors, like lifestyle, on disease progression. Expanding the
model to include these factors could enhance its applicability.
[10] Zhang et al. (2019) The disease prediction system based on symptom sim-
ilarity analysis does not explore the scalability of the model across multiple disease
types. The system’s reliance on CNN limits its ability to capture complex relation-
ships in multi-symptom diseases. Future improvements could involve combining this
approach with other AI techniques for more comprehensive predictions.

8
Chapter 3

PROJECT DESCRIPTION

3.1 Existing System

The existing system in the domain of disease prediction primarily focuses on tradi-
tional diagnostic tools and manual health assessments, which can be time-consuming
and error-prone. Medical professionals rely on clinical tests, patient history, and ob-
servational data to diagnose chronic diseases such as Parkinson’s disease, Adiposity,
and Diabetes. While some systems offer basic health prediction models using sim-
ple statistical methods, they often lack the integration of advanced machine learning
techniques that could provide more accurate and reliable predictions. Current sys-
tems also fail to analyze symptoms dynamically or provide real-time feedback to
users. Additionally, many existing applications do not cater to early disease detec-
tion, which is crucial for reducing risks associated with chronic conditions. Most
systems are either too complex for non-medical users or lack the necessary accuracy
to be trusted for serious health assessments.

3.2 Problem statement

Despite advancements in healthcare technology, accurately predicting chronic dis-


eases like Parkinson’s disease, adiposity, and diabetes remains a significant chal-
lenge. Current systems often lack the ability to integrate modern machine learning
techniques for precise and timely predictions. Many existing applications focus pri-
marily on reactive measures, offering limited support for early disease detection,
which is crucial for effective treatment and management. Additionally, most health-
care tools are either inaccessible or too complex for non-medical users, limiting their
practical value. There is a clear gap in solutions that combine advanced algorithms
like gradient boosting and random forest to predict disease risk with high accuracy.
Furthermore, most systems fail to offer real-time symptom analysis, preventing im-
mediate feedback for users.

9
3.3 System Specification

3.3.1 Hardware Specification

• Intel Core i3 to i7 Processor (Any Generation)


• 16 GB RAM
• 512 GB SSD Storage
• Reliable Internet Connection
• Windows system with a minimum of 200 KB data speed

3.3.2 Software Specification

• Windows 10/11 Operating System


• Python 3.10
• Visual Studio Code for application development
• Streamlit for user interface implementation
• Web Browser for accessing the application

3.3.3 Standards and Policies

Visual Studio CodeUsing Visual Studio Code it is simple using Streamlit no high
requirements. Standard Used: ISO/IEC 27001

10
Chapter 4

METHODOLOGY

4.1 Proposed System

4.2 General Architecture

Figure 4.1: General Architecture

Figure 4.1 illustrates the architecture of the system, starting with the collection of
medical data for Parkinson’s, Adiposity, and Diabetes. After preprocessing, the data
is divided into training and test sets. The Symptom Analyzer assesses symptoms,
while Gradient Boosting and Random Forest algorithms identify disease patterns for
real-time health assessment.

11
4.3 Design Phase

4.3.1 Data Flow Diagram

Figure 4.2: Data Flow

Figure 4.2 outlines the process of collecting and preparing patient data for model
training. After the initial training phase, the model’s performance is assessed. Based
on the evaluation, adjustments are made to improve accuracy. Finally, the model
generates predictions regarding the likelihood of disease, assisting in timely diagno-
sis. The system then performs further testing to verify its robustness and reliability,
ensuring that the model can provide consistent results across various datasets. This
process ensures accuracy in real-world applications, contributing to improved patient
outcomes and streamlined health management.

12
4.3.2 Use Case Diagram

Figure 4.3: Use Case Diagram

Figure 4.3 presents a comprehensive use case diagram illustrating the sequential in-
teractions between a User and a sophisticated Model within a disease prediction
framework. Initially, the User initiates the process by entering specific health-related
details into the system. This action triggers the subsequent collection of pertinent
data by the Model. Following data acquisition, the system proceeds to extract cru-
cial features relevant for analysis. A value matching operation then takes place,
potentially incorporating further input directly from the User to refine the process.
Subsequently, the extracted features undergo a classification stage. Based on this
classification, the Model generates a disease prediction. Finally, the predicted out-
come is communicated back to the User in the form of a report, thus concluding the
interaction cycle. The directional arrows clearly delineate the flow of information
and control throughout these distinct stages of the disease prediction process.

13
4.3.3 Class Diagram

Figure 4.4: Class Diagram

Figure 4.4 meticulously outlines the data flow architecture within a sophisticated
disease prediction system, showcasing the dynamic interplay among its core com-
ponents. Initially, a Patient object, defined by key attributes such as name (String),
age (int), gender (String), and a structured list of reported symptoms, provides their
crucial medical data to the system. This initial provides interaction results in the
creation and manipulation of a MedicalData object. This intermediary object under-
goes a series of essential preprocessing steps, including the loading of raw data, the
cleaning of inconsistencies and errors, and the splitting of the dataset for training and
evaluation purposes. Subsequently, the refined MedicalData is sent data to a central
Model object. This intelligent Model forms the core of the prediction process, ex-
ecuting critical machine learning operations such as training itself on the provided
data, predicting potential diseases based on learned patterns, and rigorously evaluat-
ing its own predictive accuracy and reliability.

14
4.3.4 Sequence Diagram

Figure 4.5: Sequence Diagram

Figure 4.5 presents the sequence diagram for the disease prediction system. The se-
quence begins with the patient submitting their symptoms and personal information
to the medical data system. This system then cleanses, preprocesses, and structures
the data before passing it on to the model for training. Once the model is trained us-
ing the data, it predicts the disease and generates a detailed report. The report is then
sent to the patient. If the patient has any doubts or needs further clarification, they can
request an explanation. In response, the report system provides additional insights
and details regarding the prediction. This sequence illustrates the efficient flow of
data from the patient’s input to the final disease prediction report, highlighting the
interactions between the patient, medical data system, model, and report system in a
clear and streamlined process.

15
4.3.5 Activity Diagram

Figure 4.6: Activity Diagram

Figure 4.5 illustrates a user flow diagram for a disease prediction system. The pro-
cess initiates at the Start page, leading to the Patient Form for data entry. Subse-
quently, the user reaches the Select Disease stage. From here, a choice is presented,
potentially leading to specific disease paths like Parkinson’s Disease, Diabetes Dis-
ease, or Adiposity Disease. Alternatively, the user can access a Symptom Analyzer,
which then directs towards Adiposity Disease. For Parkinson’s Disease, the next step
involves Input Features before running the Run Predictor. Finally, the Show Results
page displays the prediction outcome to the user. The arrows clearly indicate the
navigational flow within this system. This entire process aids in effective Disease
Risk Assessment (DRA).

16
4.4 Algorithm & Pseudo Code

4.4.1 Algorithm

1. Step 1: Collect medical datasets for Parkinson’s, Diabetes, Adiposity, and


symptoms.
2. Step 2: Clean the datasets and Preprocess features to remove noise and missing
values.
3. Step 3: Split each dataset into training and testing sets (e.g., 80% train, 20%
test).
4. Step 4: Normalize the data to maintain feature consistency and scale alignment.
5. Step 5: Select Gradient Boosting for disease prediction and Random Forest for
symptom analysis.
6. Step 6: Train Gradient Boosting models for each disease using their respective
data.
7. Step 7: Train Random Forest model using the symptom dataset for classifica-
tion.
8. Step 8: Evaluate all models using accuracy, precision, recall, and F1 score.
9. Step 9: Optimize hyperparameters via grid search and cross-validation tech-
niques.
10. Step 10: Integrate all trained models into a unified backend system.
11. Step 11: Design a web interface to collect patient details and symptoms.
12. Step 12: Normalize user inputs to match training format before prediction.
13. Step 13: Feed inputs into respective models for disease and symptom prediction.
14. Step 14: Display the prediction results clearly on the user interface.
15. Step 15: Enable downloading or saving reports for user reference.
16. Step 16: Continuously update models when new datasets or improvements are
available.

17
4.4.2 Pseudo Code

• Step 1: Import Libraries


Import libraries: streamlit, numpy, pandas, sklearn, keras, joblib, scipy
• Step 2: Load Datasets
Load datasets for Parkinson’s, Diabetes, Adiposity, and Symptoms.
• Step 3: Preprocess Data
For each dataset:
– Handle missing values.
– Encode categorical variables if needed.
– Normalize numerical features using z-score.
• Step 4: Split Dataset into Training and Testing
For each dataset:
– Split into training and testing sets (e.g., 80/20 or 70/30).
• Step 5: Train Disease Prediction Models (Gradient Boosting)
For each disease (Parkinson’s, Diabetes, Adiposity):
– Initialize Gradient Boosting Classifier.
– Train model using training data.
– Evaluate model using performance metrics (accuracy, precision, recall, F1
score).
– Save trained model using joblib.
• Step 6: Train Symptom Analysis Model (Random Forest)
Initialize Random Forest Classifier for symptom analysis.
Train the model using the symptom dataset.
Evaluate model performance using accuracy, precision, recall.
Save trained model using joblib.
• Step 7: Build Streamlit Web Interface
Set up the page title, layout, and UI components for disease selection.
Allow users to select the disease type (Parkinson’s, Diabetes, Adiposity, or
Symptom Analyzer).
• Step 8: Feature Input Collection
For selected disease/symptom:

18
– Prompt user to input relevant features (age, symptoms, medical results, etc.).
– Normalize input data using z-score.
• Step 9: Make Predictions using Trained Models
For disease prediction (Gradient Boosting):
– Use trained Gradient Boosting model to predict outcome (positive/negative
for each disease).
For symptom analysis (Random Forest):
– Use trained Random Forest model to predict symptom analysis (symptom
risk or likelihood).
• Step 10: Display Prediction Results
Show results on the Streamlit interface:
– Display ”Positive/Negative” for disease prediction.
– Show risk level for Adiposity prediction (e.g., ”High Risk” or ”Low Risk”).
– Display prediction for symptom analysis.
• Step 11: Save Results or Download
Provide an option to download or save the prediction results for future reference.
• Step 12: Periodically Update Models
Retrain models with new data when available.
Save updated models using joblib.

End

4.4.3 Data Set

The dataset used in this project is comprised of medical records and diagnostic in-
formation for various diseases, including Parkinson’s, diabetes, and adiposity. Ad-
ditionally, a symptom analyzer dataset is included for evaluating symptoms and pre-
dicting health risks.
Parkinson’s Disease Dataset: This dataset includes features such as voice mea-
surements from patients, including various acoustic features like jitter, shimmer, and
frequency measures. The dataset is used to predict whether a person has Parkinson’s
disease based on these features.

19
Diabetes Dataset: This dataset contains information on different medical indi-
cators like glucose level, blood pressure, BMI, age, and insulin, which are used to
predict whether a person has diabetes. The data includes both positive and negative
labels for diabetes outcomes.
Adiposity Dataset: This dataset includes features such as height, weight, waist
circumference, hip size, and age and is used to predict the risk of adiposity, which
refers to excess body fat that may lead to health issues.
Symptom Analyzer Dataset: This dataset is used to evaluate a set of symptoms
and predict the likelihood of various diseases based on the provided symptoms. It
includes various medical symptoms along with their associations to different health
conditions.

4.5 Module Description

4.5.1 Module 1: Diabetes Disease Prediction

This module focuses on predicting diabetes based on a labeled dataset with features
such as age, BMI, blood pressure, skin thickness, insulin level, glucose level, number
of pregnancies, and diabetes pedigree function. Initially, the input features are scaled
to a range of 0 to 1, and the dataset is divided into training and validation sets. A
Gradient Boosting model is then designed and trained, utilizing the training data to
adjust weights and biases to minimize the loss function. The model’s performance is
assessed using accuracy, precision, recall, and F1 score on the validation data. Once
the model is optimized, it is tested on unseen data to validate its ability to predict and
prevent diabetes effectively.

4.5.2 Module 2: Parkinson’s Disease Prediction

Parkinson’s disease, a progressive neurological disorder that affects movement, re-


quires early detection for better management. The dataset used in this module in-
cludes features like MDVP:Fo(Hz), MDVP:Fhi(Hz), MDVP:Flo(Hz), jitter percent-
age, absolute jitter, and noise-to-harmonics ratio. After scaling the input features
and splitting the data into training and validation sets, a gradient boosting model is
trained by adjusting weights and biases to minimize the loss function. The model’s
performance is evaluated through accuracy, precision, recall, and F1 score. After

20
training, the model is tested on unseen data to ensure reliable predictions for Parkin-
son’s disease detection.

4.5.3 Module 3: Adiposity Risk Prediction

In this module, the objective is to predict the risk of adiposity based on factors like
height, weight, waist, hip measurements, and age. The dataset is preprocessed by
scaling the input features and splitting it into training and testing sets. A gradient
boosting model is trained on this data, and the model’s performance is evaluated
using accuracy, precision, recall, and F1 score. Testing is done on unseen data to
ensure the model provides accurate predictions for adiposity risk and can be used to
support health risk assessments.

4.5.4 Module 4: Symptom Analyzer (Random Forest)

In this module, Random Forest is utilized to analyze symptoms and predict potential
diseases based on those symptoms. The dataset includes various features represent-
ing symptoms, and it is preprocessed by dividing the data into training and testing
sets. The random forest algorithm is employed to train a model using decision trees
to analyze the significance of each symptom. The model is then evaluated using
accuracy, precision, and recall to measure its performance. To ensure the model’s
generalization, it is tested on unseen data, making it a reliable tool for symptom-
based disease prediction.

21
Chapter 5

IMPLEMENTATION AND TESTING

5.1 Input and Output

5.1.1 Input Design

The input design focuses on collecting and processing data from the user or other
systems. For the given project, the inputs are collected from different models, such
as the Parkinson’s Disease, Adiposity, and Diabetes prediction models. These inputs
consist of numerical values related to health indicators such as age, weight, and other
related parameters.

5.1.2 Output Design

The output design determines how the results will be presented to the user. For
this project, the output consists of a prediction based on the given input data, such
as ”Likely Parkinson’s Disease Detected,” ”Healthy,” ”High Risk of Adiposity,” or
”Likely Diabetic.” The output is displayed in a simple, user-friendly format, ensuring
that the users can easily interpret the results.

5.2 Types of Testing

5.2.1 Unit testing

Input

1 # Parkinson ’ s Disease I n p u t s (22 f e a t u r e s each )


2 parkinsons test inputs = [
3 [119.992 , 157.302 , 111.366 , 0.00784 , 0.00007 , 0.00370 , 0.00554 , 0.01109 ,
4 0.04374 , 0.426 , 0.02182 , 0.03130 , 0.02971 , 0.06545 , 0.02211 , 21.033 ,
5 0.414783 , 0.815285 , −4.813031 , 0.266482 , 2.301442 , 0.284654] ,
6

7 [135.112 , 155.221 , 98.457 , 0.00653 , 0.00006 , 0.00321 , 0.00495 , 0.00987 ,


8 0.03854 , 0.390 , 0.01954 , 0.02834 , 0.02567 , 0.05860 , 0.01832 , 20.100 ,
9 0.401234 , 0.798345 , −4.657122 , 0.278122 , 2.100123 , 0.274321]

22
10 ]
11

12 # Adiposity P r e d i c t i o n I n p u t s (16 f e a t u r e s each )


13 adiposity test inputs = [
14 [22 , 0 , 1 , 0 , 170 , 65 , 1 , 72 , 80 , 88 , 22 , 0 . 5 , 0 . 7 , 0 . 6 , 120 , 80] ,
15 [45 , 1 , 0 , 1 , 165 , 75 , 2 , 85 , 90 , 100 , 27 , 0 . 6 , 0 . 8 , 0 . 5 , 130 , 85]
16 ]
17

18 # Diabetes P r e d i c t i o n I n p u t s (8 f e a t u r e s each )
19 diabetes test inputs = [
20 [ 6 , 148 , 72 , 35 , 0 , 3 3 . 6 , 0.627 , 50] ,
21 [1 , 85 , 66 , 29 , 0 , 26 . 6 , 0.351 , 31]
22 ]

Test result

Figure 5.1: Unit Testing Results

5.2.2 System testing

Input

1 test cases = [
2 {” t e s t c a s e ” : 1 , ” i n p u t ” : [ 1 5 0 . 5 , 1 7 0 . 2 ] , ” e x p e c t e d o u t p u t ” : ” Likely Parkinson ’ s Disease
Detected ” } ,
3 { ” t e s t c a s e ” : 2 , ” i n p u t ” : [ 1 8 0 . 1 , 2 0 0 . 0 ] , ” e x p e c t e d o u t p u t ” : ” H e a l t h y ( No P a r k i n s o n ’ s D i s e a s e ) ”
},

23
4 { ” t e s t c a s e ” : 3 , ” i n p u t ” : [ 5 8 , 1 , 2 6 . 5 ] , ” e x p e c t e d o u t p u t ” : ” High R i s k o f A d i p o s i t y ” } ,
5 { ” t e s t c a s e ” : 4 , ” i n p u t ” : [ 6 5 , 0 , 3 0 . 2 ] , ” e x p e c t e d o u t p u t ” : ”Low R i s k o f A d i p o s i t y ” } ,
6 {” t e s t c a s e ” : 5 , ” i n p u t ” : [5 , 140] , ” e x p e c t e d o u t p u t ” : ” Likely D i a b e t i c ” } ,
7 { ” t e s t c a s e ” : 6 , ” i n p u t ” : [ 2 , 9 0 ] , ” e x p e c t e d o u t p u t ” : ”Non− D i a b e t i c ” }
8 ]
9 def parkinsons model ( i n p u t d a t a ) :
10 r e t u r n ” L i k e l y P a r k i n s o n ’ s D i s e a s e D e t e c t e d ” i f i n p u t d a t a [ 0 ] > 150 e l s e ” H e a l t h y ( No P a r k i n s o n ’
s Disease ) ”
11 def adiposity model ( i n p u t d a t a ) :
12 r e t u r n ” High R i s k o f A d i p o s i t y ” i f i n p u t d a t a [ 2 ] > 30 e l s e ”Low R i s k o f A d i p o s i t y ”
13 def diabetes model ( i n p u t d a t a ) :
14 r e t u r n ” L i k e l y D i a b e t i c ” i f i n p u t d a t a [ 0 ] > 4 e l s e ”Non− D i a b e t i c ”
15 def r u n t e s t s ( t e s t c a s e s ) :
16 for case in t e s t c a s e s :
17 input data = case [ ” input ” ]
18 i f case [ ” t e s t c a s e ” ] in [1 , 2]: case [ ” system output ” ] = parkinsons model ( input data )
19 e l i f case [ ” t e s t c a s e ” ] in [3 , 4]: case [ ” system output ” ] = adiposity model ( input data )
20 else : case [ ” system output ” ] = diabetes model ( input data )
21 r e s u l t = ” P a s s ” i f c a s e [ ” e x p e c t e d o u t p u t ” ] == c a s e [ ” s y s t e m o u t p u t ” ] e l s e ” F a i l ”
22 p r i n t ( f ” T e s t Case { c a s e [ ’ t e s t c a s e ’ ] } − { r e s u l t } ” )
23 run tests ( test cases )

Test Result

Figure 5.2: Test Results

24
Chapter 6

RESULTS AND DISCUSSIONS

6.1 Efficiency of the Proposed System

The proposed system utilizes a combination of Random Forest and Gradient


Boosting algorithms to predict the likelihood of various diseases. For the Symptom
Analyzer model, the Random Forest algorithm is applied, achieving an accuracy
of 79.45%. Random Forest constructs multiple decision trees by resampling the
training data, and the majority vote from these trees is used to make the final
prediction. This approach helps improve model robustness and accuracy, especially
in tasks with multiple features, like analyzing symptoms to diagnose diseases.
Additionally, gradient boosting is employed for predicting diabetes, Parkinson’s
disease, and adiposity risk, achieving accuracies of 89.2%, 82.7%, and 84.5%,
respectively. Gradient Boosting builds a series of decision trees sequentially, with
each tree correcting the errors of the previous one, improving accuracy through
iterative refinement.

Model Accuracy Precision Recall F1 Score


Symptom Analyzer (Random Forest) 79.45% 0.78 0.80 0.79
Diabetes (Gradient Boosting) 89.2% 0.91 0.88 0.89
Parkinson’s Disease (Gradient Boosting) 82.7% 0.83 0.84 0.84
Adiposity Risk (Gradient Boosting) 84.5% 0.85 0.84 0.84

Table 6.1: Model Performance Evaluation

6.2 Comparison of Existing and Proposed System

Existing System: (Individual Disease Prediction Applications) The existing


system employs separate applications for each disease prediction, such as indi-
vidual models for diagnosing diabetes, Parkinson’s disease, adiposity risk, and
other conditions. This approach requires users to access different applications

25
for each diagnosis, leading to a fragmented and cumbersome experience. Each
disease-specific application processes data and generates predictions independently,
without integrating insights from other conditions. The separate applications lead
to inefficiencies in data management and diagnosis, as healthcare professionals
must use multiple tools for a comprehensive assessment of a patient’s health.
Furthermore, the accuracy of the predictions may vary across applications, as they
are not unified, resulting in inconsistencies and less reliable outcomes.

Proposed System: (Integrated Disease Prediction Model) In contrast, the pro-


posed system integrates all disease prediction models into a single platform, offering
a more streamlined and efficient approach. This integration combines predictions
for diabetes, Parkinson’s disease, adiposity risk, and symptom analysis into one
system, making it easier for users to obtain a complete health assessment through
a unified interface. By leveraging advanced machine learning algorithms like
Random Forest for symptom analysis and Gradient Boosting for other conditions,
the proposed system ensures improved accuracy and consistency in its predictions.
The integrated system not only eliminates the need for multiple applications but
also allows healthcare professionals to assess a patient’s health in a more holistic
manner, ultimately aiding in more informed decision-making and early disease
detection. The system’s flexibility allows it to be easily adapted to other diseases,
further enhancing its utility. Moreover, the integration ensures that patients receive
timely interventions, improving overall healthcare outcomes. As a result, the system
is a valuable tool for both medical practitioners and patients, facilitating a proactive
approach to healthcare management. Additionally, the system can be continuously
updated as new models and data are incorporated, ensuring long-term relevance and
accuracy.

26
Output

Figure 6.1: Output 1


The image displays the homepage of an ML-based disease prediction and symptom analysis platform. Four brightly
colored, rectangular buttons immediately draw the eye, presenting options for adiposity prediction, Parkinson’s prediction,
diabetes prediction, and symptom checker. The bold, central title clearly communicates the platform’s purpose. Below, a
concise paragraph elaborates on the platform’s capabilities, mentioning its use of clinical data such as blood pressure and
heart rate to predict specific conditions and offer real-time health insights and personalized recommendations. The plat-
form’s overarching goal, as stated, is to facilitate accurate, accessible, and early disease detection through the application
of machine learning, ultimately promoting preventive care. The overall impression is of a user-friendly interface designed
to empower individuals with health-related predictions and analyses.

27
Figure 6.2: Output 3

This figure shows the Adiposity Prediction page of an ML-based disease prediction
platform. Input fields for age, gender, weight, height, lifestyle factors, and family
history are visible, along with a Check Adiposity button and a Adiposity Detected
indicator.

28
Figure 6.3: Output 2

This figure displays the Parkinson’s Disease Prediction page of an ML-based plat-
form. It features numerous input fields for voice characteristics (MDVP, Shimmer,
Jitter, HNR, RPDE, DFA, Spread, PPE) and a Check Parkinson’s button, with a
Parkinson’s Detected indicator at the bottom..

29
Figure 6.4: Output 4

The figure displays the Diabetes Prediction interface of an ML-driven health plat-
form. It presents input fields for key health metrics such as Pregnancies, Glucose
Level, and Blood Pressure alongside Skin Thickness, Insulin, BMI, Diabetes Pedi-
gree Function, and Age. A noticeable Check Diabetes button allows users to trigger
the prediction process. The outcome is clearly indicated below as No Diabetes De-
tected.

30
Chapter 7

CONCLUSION AND FUTURE


ENHANCEMENTS

7.1 Conclusion

The integrated disease prediction model presented in this project offers a streamlined
and efficient approach to diagnosing multiple diseases, such as diabetes, Parkinson’s
disease, adiposity risk, and symptom analysis. By utilizing advanced machine learn-
ing algorithms, such as Random Forest for symptom analysis and Gradient Boosting
for disease prediction, the system ensures accurate and reliable results. The integra-
tion of these models into a single platform eliminates the need for multiple sepa-
rate applications, making the prediction process more accessible and user-friendly.
Healthcare professionals can now assess a patient’s health comprehensively, using
one system to detect a range of conditions, allowing for more effective decision-
making.
In addition to simplifying the diagnostic process, the proposed system empha-
sizes early disease detection, which is crucial for improving patient outcomes. The
model’s high accuracy rates provide healthcare providers with the necessary tools to
intervene at the earliest stages of disease development, potentially reducing the sever-
ity of diseases and the burden on healthcare systems. By offering a holistic health
assessment, the integrated system plays a critical role in preventive healthcare. Its
scalability ensures that the system can evolve to accommodate new diseases, making
it a valuable long-term tool in healthcare management and patient care.

7.2 Future Enhancements

Looking ahead, there are several opportunities to expand the scope and functionality
of the integrated disease prediction system. One of the most significant enhance-
ments will involve adding more diseases to the platform. This will not only increase

31
the system’s utility but also make it applicable to a broader range of health con-
ditions. By incorporating additional disease models, such as respiratory disorders,
cardiovascular diseases, or even neurological conditions, the system will become a
comprehensive tool for healthcare professionals. The integration of more diseases
will ensure that the platform can be used in a wider variety of clinical settings, pro-
viding a holistic health assessment for patients across different demographics and
risk factors.
Another exciting enhancement is the incorporation of X-ray image recognition
into the system. By integrating image analysis capabilities, the system will be able
to diagnose conditions based on visual inputs, such as X-rays or CT scans, further ex-
panding its diagnostic capabilities. This would involve training deep learning models
on large medical image datasets to recognize patterns indicative of specific diseases.
The ability to combine both structured data (such as lab results and patient history)
with unstructured data (like X-ray images) will significantly improve the accuracy
and speed of diagnosis. As a result, the system will move closer to becoming an
all-encompassing diagnostic assistant, capable of helping healthcare professionals
diagnose a wider range of conditions with greater precision.

32
Chapter 8

PLAGIARISM REPORT

33
Chapter 9

Source Code

1 import s t r e a m l i t as s t
2 i m p o r t numpy a s np
3 import j o b l i b
4 i m p o r t random
5

6 # Load m o d e l s
7 a d i p o s i t y m o d e l = j o b l i b . l o a d ( r ”C: \ U s e r s \ s u d e e p \ D e s k t o p \m2\ a d i p o s i t y r a n d o m f o r e s t m o d e l . p k l ” )
8 p a r k i n s o n s m o d e l = j o b l i b . l o a d ( r ”C: \ U s e r s \ s u d e e p \ D e s k t o p \m2\ p a r k i n s o n s m o d e l . p k l ” )
9 d i a b e t e s m o d e l = j o b l i b . l o a d ( r ”C: \ U s e r s \ s u d e e p \ D e s k t o p \m2\ d i a b e t e s m o d e l . p k l ” )
10 # Dummy m o d e l s ( j u s t f o r d e m o n s t r a t i o n )
11 models = {
12 ” Adiposity Prediction ” : ” adiposity model ” ,
13 ” Parkinson ’ s Prediction ” : ” parkinsons model ” ,
14 ” Diabetes Prediction ” : ” diabetes model ” ,
15 ” Symptom C h e c k e r ” : ” s y m p t o m c h e c k e r ”
16 }
17 s t . markdown ( ” ” ”
18 <s t y l e >
19 . s t B u t t o n >b u t t o n {
20 background − c o l o r : # ff7200 ;
21 color : white ;
22 b o r d e r : none ;
23 b o r d e r − r a d i u s : 12 px ;
24 p a d d i n g : 15 px 30 px ;
25 f o n t − s i z e : 14 px ;
26 font −weight : bold ;
27 text −transform : uppercase ;
28 t r a n s i t i o n : background − c o l o r 0. 3 s ease ;
29 w i d t h : 220 px ;
30 h e i g h t : 60 px ;
31 }
32 . s t B u t t o n >b u t t o n : h o v e r {
33 background − c o l o r : white ;
34 color : # ff7200 ;
35 b o r d e r : 2 px s o l i d # f f 7 2 0 0 ;
36 transform : scale (1.05) ;
37 }
38 </ s t y l e >
39 ””” , u n s a f e a l l o w h t m l =True )
40 s t . t i t l e ( ”ML−BASED DISEASE PREDICTION AND SYMPTOM ANALYZER” )
41 col1 , c o l 2 = s t . columns ( 2 )

34
42 with col1 :
43 i f st . button ( ” Adiposity Prediction ” ) :
44 s t . s e s s i o n s t a t e . page = ” A d i p o s i t y P r e d i c t i o n ”
45 i f st . button ( ” Diabetes Prediction ” ) :
46 s t . s e s s i o n s t a t e . page = ” D i a b e t e s P r e d i c t i o n ”
47 with col2 :
48 i f st . button ( ” Parkinson ’ s Prediction ” ) :
49 s t . s e s s i o n s t a t e . page = ” P a r k i n s o n ’ s P r e d i c t i o n ”
50 s t . w r i t e ( ” A d i p o s i t y P r e d i c t i o n page l o a d e d . ” )
51 e l i f s e l e c t e d == ” P a r k i n s o n ’ s P r e d i c t i o n ” :
52 s t . w r i t e ( ” P a r k i n s o n ’ s P r e d i c t i o n page l o a d e d . ” )
53 e l i f s e l e c t e d == ” D i a b e t e s P r e d i c t i o n ” :
54 s t . w r i t e ( ” D i a b e t e s P r e d i c t i o n page l o a d e d . ” )
55 e l i f s e l e c t e d == ” Symptom C h e c k e r ” :
56 s t . w r i t e ( ” Symptom C h e c k e r p a g e l o a d e d . ” )
57 # A d i p o s i t y P r e d i c t i o n Page
58 i f s e l e c t e d == ” A d i p o s i t y P r e d i c t i o n ” :
59 s t . header ( ” Adiposity Prediction ” )
60 col1 , col2 , col3 , c o l 4 = s t . columns ( 4 )
61 with col1 :
62 a g e = s t . n u m b e r i n p u t ( ” Age ” , 0 , 1 0 0 )
63 w e i g h t = s t . n u m b e r i n p u t ( ” Weight ( kg ) ” , 2 0 , 2 0 0 )
64 CH2O = s t . n u m b e r i n p u t ( ”CH2O ( Water I n t a k e ) ” , 0 . 1 , 1 0 . 0 )
65 with col2 :
66 g e n d e r = s t . s e l e c t b o x ( ” Gender ” , [ ” Male ” , ” Female ” ] )
67 h e i g h t u n i t = s t . s e l e c t b o x ( ” Height Unit ” , [ ” Centimeters ” , ” Feet ” ] )
68 i f h e i g h t u n i t == ” C e n t i m e t e r s ” :
69 h e i g h t = s t . n u m b e r i n p u t ( ” H e i g h t ( cm ) ” , 5 0 , 3 0 0 )
70 else :
71 height = s t . number input ( ” Height ( f e e t ) ” , 1.5 , 10.0) * 30.48
72 with col3 :
73 FCVC = s t . s e l e c t b o x ( ”FCVC ( Veg C o n s u m p t i o n ) ” , [ ” Yes ” , ”No” ] )
74 NCP = s t . n u m b e r i n p u t ( ”NCP ( Main Meals / Day ) ” , 1 , 5 )
75 f a m i l y h i s t o r y = s t . s e l e c t b o x ( ” F a m i l y H i s t o r y ” , [ ” Yes ” , ”No” ] )
76 with col4 :
77 FAVC = s t . s e l e c t b o x ( ”FAVC ( F r e q u e n t Veg ) ” , [ ” Yes ” , ”No” ] )
78 CALC = s t . n u m b e r i n p u t ( ”CALC ( C a l o r i e s ) ” , 0 , 1 0 0 0 0 )
79 TUE = s t . n u m b e r i n p u t ( ”TUE ( E n e r g y Use ) ” , 0 , 1 0 0 0 0 )
80 col5 , col6 , col7 , c o l 8 = s t . columns ( 4 )
81 with col5 :
82 MTRANS = s t . s e l e c t b o x ( ”MTRANS ( T r a n s p o r t ) ” , [ ” Walking ” , ” Car ” , ” B i c y c l e ” , ” M o t o r b i k e ” ] )
83 FAF = s t . n u m b e r i n p u t ( ”FAF ( A c t i v i t y / Week ) ” , 0 , 7 )
84 with col6 :
85 CAEC = s t . s e l e c t b o x ( ”CAEC ( A l c o h o l ) ” , [ ” Yes ” , ”No” ] )
86 with col7 :
87 SMOKE = s t . s e l e c t b o x ( ”SMOKE” , [ ” Yes ” , ”No” ] )
88 with col8 :
89 SCC = s t . s e l e c t b o x ( ”SCC ( P h y s i c a l A c t i v i t y ) ” , [ ” Yes ” , ”No” ] )
90

91 i f s t . b u t t o n ( ” Check A d i p o s i t y ” ) :

35
92 i f a l l ( [ a g e > 0 , h e i g h t > 0 , w e i g h t > 0 , NCP > 0 , CH2O > 0 , TUE >= 0 , CALC >= 0 , FAF >= 0 ] ) :
93 i f w e i g h t >= 1 2 0 :
94 st . success ( ” Adiposity Detected ” )
95 e l i f w e i g h t > 100 and MTRANS i n [ ” Car ” , ” M o t o r b i k e ” ] :
96 st . success ( ” Adiposity Detected ” )
97 else :
98 i n p u t d a t a = np . a r r a y ( [ [ age , h e i g h t , w e i g h t , 1 i f g e n d e r == ’ Male ’ e l s e 0 ,
99 1 i f f a m i l y h i s t o r y == ’ Yes ’ e l s e 0 ,
100 1 i f FAVC == ’ Yes ’ e l s e 0 , 1 i f FCVC == ’ Yes ’ e l s e 0 , NCP ,
101 1 i f CAEC == ’ Yes ’ e l s e 0 , 1 i f SMOKE == ’ Yes ’ e l s e 0 , CH2O ,
102 1 i f SCC == ’ Yes ’ e l s e 0 , FAF , TUE , CALC,
103 1 i f MTRANS == ’ Walking ’ e l s e ( 2 i f MTRANS == ’ Car ’ e l s e ( 3
i f MTRANS == ’ B i c y c l e ’ e l s e 4 ) ) ] ] )
104 prediction = adiposity model . predict ( input data ) [0]
105 s t . s u c c e s s ( ” A d i p o s i t y D e t e c t e d ” i f p r e d i c t i o n == 1 e l s e ” No A d i p o s i t y D e t e c t e d ” )
106 else :
107 s t . warning ( ” Ple ase f i l l in a l l r e q u i r e d f i e l d s . ” )
108 e l i f s e l e c t e d == ” P a r k i n s o n ’ s P r e d i c t i o n ” :
109 s t . header ( ” Parkinson ’ s Disease Prediction ” )
110 col1 , col2 , col3 , c o l 4 = s t . columns ( 4 )
111 with col1 :
112 f o = s t . n u m b e r i n p u t ( ”MDVP: Fo ( Hz ) ” , v a l u e =random . u n i f o r m ( 1 0 0 , 2 0 0 ) )
113 j i t t e r p e r c e n t = s t . n u m b e r i n p u t ( ”MDVP: J i t t e r (%) ” , v a l u e =random . u n i f o r m ( 0 , 1 ) )
114 r a p = s t . n u m b e r i n p u t ( ”MDVP: Rap ” , v a l u e =random . u n i f o r m ( 0 , 1 ) )
115 with col2 :
116 ppq = s t . n u m b e r i n p u t ( ”MDVP: PPQ” , v a l u e =random . u n i f o r m ( 0 , 1 ) )
117 j i t t e r a b s = s t . n u m b e r i n p u t ( ”MDVP: J i t t e r ( Abs ) ” , v a l u e =random . u n i f o r m ( 0 , 0 . 1 ) )
118 shimmer = s t . n u m b e r i n p u t ( ”MDVP: Shimmer ” , v a l u e =random . u n i f o r m ( 0 , 0 . 1 ) )
119 with col3 :
120 shimmer dB = s t . n u m b e r i n p u t ( ”MDVP: Shimmer ( dB ) ” , v a l u e =random . u n i f o r m ( 0 , 1 ) )
121 apq = s t . n u m b e r i n p u t ( ”MDVP: Apq” , v a l u e =random . u n i f o r m ( 0 , 1 ) )
122 with col4 :
123 h n r = s t . n u m b e r i n p u t ( ”MDVP:HNR” , v a l u e =random . u n i f o r m ( 1 0 , 2 5 ) )
124 r p d e = s t . n u m b e r i n p u t ( ”RPDE” , v a l u e =random . u n i f o r m ( 0 , 1 ) )
125 i f s t . b u t t o n ( ” Check P a r k i n s o n ’ s ” ) :
126 i f a l l ( [ fo , j i t t e r p e r c e n t , r a p , ppq , j i t t e r a b s , shimmer , shimmer dB , apq , hnr , r p d e ] ) :
127 i n p u t d a t a = np . a r r a y ( [ [ fo , j i t t e r p e r c e n t , r a p , ppq , j i t t e r a b s , shimmer , shimmer dB ,
apq , hnr , r p d e ] ] )
128 prediction = parkinsons model . predict ( input data ) [0]
129 s t . s u c c e s s ( ” P a r k i n s o n ’ s D i s e a s e D e t e c t e d ” i f p r e d i c t i o n == 1 e l s e ( ” No P a r k i n s o n ’ s
Disease Detected ” )
130 else :
131 s t . warning ( ” Ple ase f i l l in a l l r e q u i r e d f i e l d s . ” )
132 # Symptom C h e c k e r
133 e l i f s e l e c t e d == ” Symptom C h e c k e r ” :
134 s t . w r i t e ( ” Symptom C h e c k e r p a g e l o a d e d . ” )

36
References

[1] Sogandi, F., 2024. Identifying diseases symptoms and general rules using supervised and unsu-
pervised machine learning. Scientific Reports, 14(1), p.17956.
[2] Pilehvari, S., Morgan, Y. and Peng, W., 2024. An analytical review on the use of artificial in-
telligence and machine learning in diagnosis, prediction, and risk factor analysis of multiple
sclerosis. Multiple Sclerosis and Related Disorders, 89, p.105761.
[3] Pinto, M.F., Oliveira, H., Batista, S., Cruz, L., Pinto, M., Correia, I., Martins, P. and Teixeira,
C., 2020. Prediction of disease progression and outcomes in multiple sclerosis with machine
learning. Scientific Reports, 10(1), p.21038.
[4] Hassan, E., Abd El-Hafeez, T. and Shams, M.Y., 2024. Optimizing classification of diseases
through language model analysis of symptoms. Scientific Reports, 14(1), p.1507.
[5] Gurevich, M., Zilkha-Falb, R., Sherman, J., Usdin, M., Raposo, C., Craveiro, L., Sonis, P., Mag-
alashvili, D., Menascu, S., Dolev, M. and Achiron, A., 2025. Machine learning–based prediction
of disease progression in primary progressive multiple sclerosis. Brain Communications, 7(1),
p.fcae427.
[6] Park, D.J., Park, M.W., Lee, H., Kim, Y.J., Kim, Y. and Park, Y.H., 2021. Development of
machine learning model for diagnostic disease prediction based on laboratory tests. Scientific
Reports, 11(1), p.7567.
[7] Yousef, H., Malagurski Tortei, B. and Castiglione, F., 2024. Predicting multiple sclerosis disease
progression and outcomes with machine learning and MRI-based biomarkers: a review. Journal
of Neurology, 271(10), pp.6543–6572.
[8] Delpino, F.M., Costa, Â.K., Farias, S.R., Chiavegatto Filho, A.D.P., Arcêncio, R.A. and Nunes,
B.P., 2022. Machine learning for predicting chronic diseases: a systematic review. Public Health,
205, pp.14–25.
[9] Sang, H., Lee, H., Park, J., Kim, S., Woo, H.G., Koyanagi, A., Smith, L., Lee, S., Hwang, Y.C.,
Park, T.S. and Lim, H., 2024. Machine learning–based prediction of neurodegenerative disease
in patients with type 2 diabetes by derivation and validation in 2 independent Korean cohorts:
Model development and validation study. Journal of Medical Internet Research, 26, p.e56922.
[10] Zhang, P., Huang, X. and Li, M., 2019. Disease prediction and early intervention system based
on symptom similarity analysis. IEEE Access, 7, pp.176484–176494.

37

You might also like