Harish Kumar's Resume

Harish Kumar Thota is a Senior Data Scientist with over 7 years of experience specializing in Applied Machine Learning, NLP, and Generative AI across various sectors including banking and healthcare. He has a strong background in developing and deploying machine learning models, optimizing pipelines, and utilizing cloud services such as AWS and GCP for data analytics. His work includes designing AI solutions that enhance automation, improve decision-making, and drive business impact through data-driven insights.

Uploaded by

auroracvsr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views6 pages

Harish Kumar's Resume

Uploaded by

auroracvsr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Harish Kumar Thota

Irving, TX | (469) 287-6899 | [email protected] | LinkedIn

PROFESSIONAL SUMMARY:
 Senior Data Scientist with over 7 years of progressive experience in Applied Machine Learning, Natural Language
Processing (NLP), Generative AI, and Data Analytics, delivering high-impact AI solutions across banking,
healthcare, and retail domains.
 Proven expertise in Generative AI system design, including LLaMA 2, Falcon, and Retrieval-Augmented Generation
(RAG) for domain-specific conversational assistants, improving automation rates and reducing decision
turnaround times.
 Proficient in managing the complete data science project lifecycle, actively contributing to all phases including
data acquisition, cleaning, engineering, feature scaling, and feature engineering.
 Skilled in building machine learning models using algorithms such as Regression, Time Series (ARIMA, Holt-
Winters), Clustering, Apriori, Decision Trees, KNN, Neural Networks, SVM, and Ensemble methods (Random
Forest, Boosting).
 Hands-on experience implementing Naive Bayes and proficient in Random Forests, Decision Trees, Linear &
Logistic Regression, SVM, Clustering, Neural Networks, and Principal Component Analysis (PCA).
 Experienced in dimensionality reduction techniques (PCA, SVD), model evaluation using metrics like AUC-ROC, K-
fold cross-validation, and delivering insights through data visualization.
 Skilled in domain adaptation of LLMs using LoRA fine-tuning, enabling cost-effective training on secure datasets
while boosting model performance in specialized terminology.
 Proficient with AWS services (DMS, Glue, Athena, S3, EMR, Step Functions, Lambda, SNS, SES, EC2, QuickSight) as
well as GCP (Dataproc, BigQuery, GCS, Vertex AI) and Azure Databricks for large-scale data analytics and insights.
 Extensive hands-on experience in Natural Language Processing (NLP) and Generative AI, with a strong foundation
in developing Large Language Model (LLM)-based solutions using GPT-4, Gemini, LLaMA, and Cortex Analyst.
 Skilled in time series forecasting using SARIMA, Prophet, and machine learning models for demand prediction,
achieving over 90% forecast accuracy.
 Skilled in developing and deploying ML models using cloud-native platforms including AWS SageMaker, GCP
Vertex AI, and AzureML Studio.
 Proficient in Python with extensive use of packages such as Pandas, NumPy, Scikit-learn, TensorFlow, Keras, NLTK,
Matplotlib, Seaborn and more for data analysis, visualization, and machine learning.
 Experience in cloud-native AI development, offering end-to-end data science solutions with the help of
technologies like Docker, MLflow, FastAPI, and CI/CD pipelines using GitHub Actions and Jenkins.
 Highly proficient in developing tailored optimization models and performing scenario-based sensitivity analyses
to support well-informed, data-driven decision-making.
 Experienced in data parsing, manipulation, and preparation using techniques like descriptive statistics, regex,
merge, subset, reindex, melt, and reshape to enable high-quality datasets for ML pipelines.
 Experienced in customer segmentation, anomaly detection, and recommender systems, driving measurable
revenue and risk reduction outcomes.
 Built and deployed fraud detection models with Logistic Regression, Random Forest, XGBoost, LightGBM, and
Neural Networks, achieving 96% accuracy in identifying suspicious transactions
 Proficient in big data ETL pipelines with Apache Spark, enabling large-scale data ingestion and transformation for
analytics and ML workflows.
 Adept at hyperparameter optimization, feature engineering, and ensemble modeling to improve predictive
accuracy and model robustness.
 Expertise in Python, SQL, Machine learning, Deep learning, and Big Data technologies. Adept at translating
business requirements into actionable ML models, collaborating with cross-functional teams, and deploying
solutions on cloud platforms. Passionate about using data to drive meaningful business impact.
 Hands-on experience implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear & Logistic
Regression, SVM, Clustering, Neural Networks, and Principal Component Analysis (PCA).
 Experienced with AWS analytics and data services (DMS, Glue, Athena, S3, EMR, Step Functions, Lambda, SNS,
SES, EC2, QuickSight) as well as GCP services (Dataproc, BigQuery, GCS, Vertex AI) and Azure Databricks for
scalable data engineering and insights.
 Strong background in data visualization and stakeholder communication through interactive dashboards
(Tableau, Power BI) and analytical reporting.

SKILLS:

Category Skills
Programming Languages Python, C, Java, R, SQL, HTML5, Bash, Linux, Shell Scripting, PySpark, MATLAB,
Data Structures and Algorithms, OOPS, RDBMS

Database & Warehousing MySQL, Oracle, MongoDB, NoSQL, Google Big Query, Snowflake, Apache Hive,
HDFS, ETL pipelines

AI Frameworks & Technologies Machine Learning, Deep Learning, Artificial Intelligence, Computer Vision, OpenCV,
Generative AI, Natural Language Processing, Scikit-learn, TensorFlow, Keras,
PyTorch

Cloud Services Azure, AWS, Docker, SageMaker, Redshift, DataProc, Kubernetes

Tools & Platforms Jupyter Notebook, VS Code, Google Collab, Microsoft Excel, Tableau, PowerBI,
Google Cloud Platform, Data Bricks, SAS

Machine Learning Modeling Linear Regression, Logistic Regression, Decision Trees, Random Forest, XGBoost,
LightGBM, SVM, K-Means Clustering, DBSCAN, ARIMA, SARIMA, Holt-Winters,
Anamoly Detection, PCA
Deep Learning Architectures CNN, RNN, LSTM, GAN, Transformers, Autoencoders, BERT, GPT
Generative AI GPT, LLaMA, Falcon, LoRA, RAG, Hugging Face Transformers, LangChain
Interpersonal Skills Time Management, Teamwork, Communication, Adaptability, Work Ethic,
Empathy, Decision Making
EDUCATION:

UNIVERSITY OF NORTH TEXAS

Master of Science in Advanced Data Analytics GPA: 4.0/4.0

Relevant Coursework: Data Analytics, Business Intelligence, Data Warehousing, Big Data, Cloud Computing Tools,
Machine Learning, Artificial Intelligence, Natural language Processing

CERTIFICATIONS:
 Machine Learning: Stanford University
 OpenCV Bootcamp: OpenCV University
 Data Analysis with Python, Pandas and NumPy
 Artificial Intelligence Foundations - Machine Learning
 Programming, Data Structures, and Algorithms using Python: NPTEL – IIT Kharagpur
 Introduction to Internet of Things: NPTEL – IIT Kharagpur
 Oxford Achiever: Certificate of Merit

WORK EXPERIENCE:
NLP DATA SCIENTIST/ ML ENGINEER Muskogee, OK
Armstrong Bank Oct 2023 –
Present
 Designed Generative AI architecture for conversational banking assistants using LLaMA 2 and Falcon models on
secure financial datasets automating 80% of credit risk assessments and cutting regulatory compliance review
time by 35%.
 Created GenAI-powered data analysis engine using Claude-v3.0-Sonnet LLM on Vertex AI, enabling automated
Python/SQL code generation for descriptive and aggregative analytics.
 Built ML models for customer retention prediction using Logistic Regression, KNN, Decision Trees, Random
Forest, XGBoost, LightGBM, and Neural Networks on Vertex AI, achieving an F1 score of 87%
 Optimized ML pipelines on AWS SageMaker, reducing training time by 40% and cutting infrastructure costs by
30%.
 Built LLM-based earnings call summarization tool using Claude-v2.1 on Vertex AI, converting video to audio,
transcribing with Speech-to-Text, and generating executive summaries via prompt engineering. Designed a user
interface with Streamlit for business accessibility.
 Implemented Retrieval-Augmented Generation (RAG) workflows combining LLMs with vector databases,
enabling real-time, domain-specific Q&A systems.
 Designed anomaly detection frameworks using Isolation Forest, Autoencoders, and statistical methods to flag
fraud or irregular activity in high-volume datasets.
 Created bias detection frameworks for LLM outputs, integrating SHAP and counterfactual evaluation to ensure
fairness, reducing demographic bias in credit approvals by 20%.
 Created a GenAI Terraform code generator with Mistral-7B LLM on Vertex AI, parsing Draw.io XML architecture
diagrams to extract cloud services/relationships and generate Terraform templates, improving performance via
prompt optimization and XML parsing enhancements.
 Implemented an RAG-based chat engine integrating Pinecone vector database with OpenAI GPT-3.5 Turbo to
analyze customer feedback and support logs, enhancing detection of service pain points such as loan delays and
digital banking issues.
 Built real-time inference systems using Kafka Streams, PyTorch and TensorFlow Serving, reducing decision
latency from minutes to seconds.
 Designed model drift detection frameworks with statistical monitoring, retriggering retraining pipelines
automatically.
 Implemented feature store architecture in Snowflake and BigQuery to ensure consistent feature availability
across training and inference.
 Developed LLMOps pipelines for lifecycle management of large language models, including prompt versioning,
automated evaluation (BLEU, ROUGE, RAGAS), and feedback-driven retraining.
 Designed BigQuery-based batch inference pipelines for automated scoring of retention and basket value models.
 Applied optimization techniques including regularization, cross-validation, and hyperparameter tuning to
maximize model performance.
 Developed a GenAI Data Analysis Engine using Claude-v3.0-Sonnet LLM on Vertex AI to automatically generate
Python or SQL for descriptive and aggregative analytics on uploaded datasets.

Sr. DATA SCIENTIST Nashville, TN

HCA Healthcare Jan 2020 – Sep
2023
 Engineered clinical text classification models using BERT and RoBERTa fine-tuned on medical corpora, achieving
94% F1-score in automatic ICD-10 code extraction from unstructured EHR notes.
 Improved model robustness using regularization (L1/L2) and cross-validation, ensuring high accuracy and
generalization across diverse patient datasets.
 Built predictive models in Python (Scikit-learn) using regression and classification algorithms to identify high-risk
patients, improving preventive care strategies and reducing hospital readmission rates by 14%.
 Developed custom Named Entity Recognition (NER) pipelines with spaCy, extracting clinical terms with 97%
precision and improving entity linking accuracy by 22% over baseline.
 Designed interactive dashboards in Tableau and Power BI to track key clinical KPIs such as patient outcomes, length
of stay, readmissions and operational efficiency.
 Worked on creating efficient and intelligent chatbots using popular ML algorithms such as Naive Bayes, Decision
Trees, Support Vector Machines, Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) and Natural
Language Processing (NLP).
 Established A/B testing frameworks for NLP-driven patient support tools, quantifying improvements in automation
rates and user satisfaction.
 Developed and deployed reinforcement learning models to optimize hospital resource allocation and decision-
making in dynamic environments.
 Built deep learning models (CNN, LSTM) for image classification, sentiment analysis, and time-series forecasting,
achieving accuracy improvements.
 Developed real-time clinical alerting systems using Kafka and Spark streaming to notify providers of high-risk patient
events within seconds.
 Led cross-functional AI initiatives, collaborating with data engineering, DevOps, and clinical operations teams to
deliver high-value ML solutions.
 Automated continuous training and deployment workflows for ML models using MLflow, Kubeflow, and SageMaker
Pipelines.
 Applied LSTM-based models to predict probability of equipment or device failure and used AWS SageMaker for
model training and deployment.
 Applied advanced optimization algorithms (Bayesian Optimization, Genetic Algorithms, Hyperband) to tune
hyperparameters, improving model accuracy and efficiency.
 Developed privacy-preserving data augmentation and resampling pipelines for model training, ensuring compliance
with HIPAA/GDPR without compromising model performance.
 Implemented anomaly detection models using Isolation Forests and Autoencoders to identify abnormal patient
vitals and reduce false alarms in clinical monitoring systems.)
 Optimized ETL workflows for large-scale healthcare data integration, improving processing speed by 35% using
Azure Databricks and distributed data lakes.
 Built end-to-end ML pipelines with Apache Airflow for data ingestion, preprocessing, model training, evaluation,
and deployment, reducing model update cycles from weeks to days.
 Processed petabyte-scale healthcare datasets using PySpark, Hadoop, and Azure for machine learning and reporting
use cases.
 Built streaming data pipelines with Kafka, Azure Event Hubs, and Spark Structured Streaming for real-time analytics
and inference on patient data.
 Automated recurring healthcare performance reports using Power BI and SQL, saving over 20 hours of manual
effort monthly.
 Developed automated model monitoring dashboards to track data drift, model drift and real-time inference
accuracy, enabling proactive retraining.

DATA ANALYST/DATA SCIENTIST Lowell, AR

J.B. Hunt June 2017 – Dec
2020
 Developed time series forecasting models using SARIMA and Facebook Prophet, achieving 92% forecast accuracy
for shipment demand prediction in retail supply chains and reducing stockouts by 15%.
 Developed regression models for spot market transportation price prediction with tree-based boosting models
(XGBoost, LightGBM), saving $12M annually in transportation costs.
 Implemented clustering techniques (K-Means, DBSCAN, Hierarchical Clustering) to segment shippers, improving
targeted pricing and contract strategies.
 Integrated end-to-end MLOps practices for model lifecycle management, versioning, and monitoring of
deployed ML-based solutions in logistics workflows, ensuring reliable demand forecasting, lane optimization and
pricing models in production.
 Designed federated learning pipelines to train models on decentralized carrier and regional shipment data,
maintaining security and reducing compliance risks.
 Conducted A/B testing for advanced analytics and decision support models, quantifying uplift in automation
rates and accuracy.
 Collaborated on the Predictive Demand Generation (PDG) feature with the team and trained an Artificial Neural
Network (ANN) using TensorFlow to identify high-potential shipper prospects from sales leads.
 Designed scalable ML pipelines for deploying and managing demand forecasting, shipper segmentation and
route recommendation models with automated evaluation and feedback loops.
 Implemented data quality and governance frameworks ensuring lineage, cataloging, and compliance with
enterprise policies.
 Developed domain-specific embeddings using Word2Vec, TF-IDF, trained on shipment attributes, lane histories
and carrier performance data to enhance semantic search, route similarity matching and carrier
recommendations.
 Designed optimization algorithms for load matching and route planning, reducing fuel costs and improving fleet
utilization.
 Implemented predictive maintenance models for transportation assets, reducing breakdowns and repair costs
by 12%.
 Created interactive operational dashboards in Tableau/Power BI to visualize logistics KPIs like shipment volumes,
on-time delivery, carrier utilization, enabling faster decision-making.
 Conducted data preprocessing, cleaning and exploratory data analysis using Pandas, SQL, and Python libraries,
ensuring data integrity.
 Managed data storage and large-scale processing pipelines in Azure using SQL, Spark, and distributed data lakes
for production, development, and testing environments
 Integrated weather and traffic data into shipment demand forecasting models, improving accuracy in volatile
market conditions.
 Developed a scalable and configurable Auto ML solution involving multiple regression and classification
Algorithms that optimizes features, algorithms and hyperparameters, reducing the experimentation time by
weeks.
 Generated predictive analytics report using Python and Tableau including visualizations of model performance
and business impact.
 Applied what-if simulations for transportation pricing strategies, increasing revenue predictability in spot market
bidding.
 Conducted machine learning proof-of-concepts and led production deployment of intelligent logistics solutions,
delivering measurable business value and continuous innovation.
 Developed explainable AI (XAI) dashboards using SHAP and LIME for model transparency, improving trust with
regulatory stakeholders.
 Engineered high-performance lane-level demand prediction models with Random Forest and XGBoost,
improving SKU-level sales forecast accuracy by 18% compared to previous models.

IT & ML Expert for Advanced AI Solutions
No ratings yet
IT & ML Expert for Advanced AI Solutions
7 pages
Rahul CV
No ratings yet
Rahul CV
7 pages
Swapna
No ratings yet
Swapna
4 pages
Priya AIML Resumee
No ratings yet
Priya AIML Resumee
5 pages
Aashish Arora DS Noida
No ratings yet
Aashish Arora DS Noida
2 pages
Data Scientist/ Machine Learning Engineer: Summary
No ratings yet
Data Scientist/ Machine Learning Engineer: Summary
4 pages
Haritha Reddy
No ratings yet
Haritha Reddy
5 pages
Ritishsajjagcp
No ratings yet
Ritishsajjagcp
7 pages
Expert Data Architect Profile
No ratings yet
Expert Data Architect Profile
11 pages
Data Scientist ML Resume
No ratings yet
Data Scientist ML Resume
5 pages
AI/ML Engineer with AWS Expertise
No ratings yet
AI/ML Engineer with AWS Expertise
6 pages
Johny DataScientist
No ratings yet
Johny DataScientist
5 pages
Kamlesh-AI - ML Architect
No ratings yet
Kamlesh-AI - ML Architect
8 pages
Siva Ram Korakutty
No ratings yet
Siva Ram Korakutty
6 pages
Kranthi New2
No ratings yet
Kranthi New2
3 pages
Resume Debabrata Das
No ratings yet
Resume Debabrata Das
3 pages
Ashish Verma AIML-merged
No ratings yet
Ashish Verma AIML-merged
3 pages
Anish's Resume
No ratings yet
Anish's Resume
2 pages
Akshay Godugu Phone: (424) 272-5152: Required Skills/Experience # Years
No ratings yet
Akshay Godugu Phone: (424) 272-5152: Required Skills/Experience # Years
6 pages
Mehdi RESUME
No ratings yet
Mehdi RESUME
8 pages
Ali Kone
No ratings yet
Ali Kone
6 pages
Data Science & ML Expert Profile
No ratings yet
Data Science & ML Expert Profile
6 pages
Avinash - Data Engineer (AutoRecovered)
No ratings yet
Avinash - Data Engineer (AutoRecovered)
10 pages
Rakesh Kumar - Data Scientist
No ratings yet
Rakesh Kumar - Data Scientist
3 pages
Mohit Chatterjee
No ratings yet
Mohit Chatterjee
2 pages
Naukri NiteshRanjanPanda (6y 5m)
No ratings yet
Naukri NiteshRanjanPanda (6y 5m)
3 pages
Data Scientist Expertise Overview
No ratings yet
Data Scientist Expertise Overview
6 pages
SHANA KALLEM - Goldman Sachs - Data Engineering
No ratings yet
SHANA KALLEM - Goldman Sachs - Data Engineering
1 page
Data Science Career Overview
No ratings yet
Data Science Career Overview
3 pages
Data Science & ML Expert Profile
No ratings yet
Data Science & ML Expert Profile
5 pages
Varsha's Resume
No ratings yet
Varsha's Resume
2 pages
Karthik's Resume 1
No ratings yet
Karthik's Resume 1
2 pages
LAHARI's Resume 1
No ratings yet
LAHARI's Resume 1
2 pages
Anjum's Resume
No ratings yet
Anjum's Resume
4 pages
To Change The Paper Size On A Brother DCP
No ratings yet
To Change The Paper Size On A Brother DCP
3 pages
Mini Project SMS
No ratings yet
Mini Project SMS
20 pages
Elementary Reading Comprehension
No ratings yet
Elementary Reading Comprehension
4 pages
Modern Quran Interpretation
No ratings yet
Modern Quran Interpretation
314 pages
Rock, Paper, Scissors Lesson Plan
No ratings yet
Rock, Paper, Scissors Lesson Plan
5 pages
All Subjects
No ratings yet
All Subjects
11 pages
B1 Tenses T010: Write Sentences, Using The Tense Given
No ratings yet
B1 Tenses T010: Write Sentences, Using The Tense Given
1 page
10
No ratings yet
10
7 pages
My Resume
No ratings yet
My Resume
1 page
Raza Abbas Technical Resume
No ratings yet
Raza Abbas Technical Resume
7 pages
Questions of SQL Assignment
No ratings yet
Questions of SQL Assignment
3 pages
Nine Planets Mantra
0% (1)
Nine Planets Mantra
14 pages
Salesforce & Siebel Expert Profile
No ratings yet
Salesforce & Siebel Expert Profile
5 pages
PXR-501T User Manual-Eng - Ind
100% (1)
PXR-501T User Manual-Eng - Ind
44 pages
Secure Home Wireless Network
No ratings yet
Secure Home Wireless Network
1 page
A Path Without Ownership - Robert Kirkpatrick - DSG
No ratings yet
A Path Without Ownership - Robert Kirkpatrick - DSG
166 pages
ISEA94 Catalogue
No ratings yet
ISEA94 Catalogue
229 pages
Language and Grammar Review 3 - Revisión Del Intento
No ratings yet
Language and Grammar Review 3 - Revisión Del Intento
4 pages
Kiran Padmashali Resume
No ratings yet
Kiran Padmashali Resume
4 pages
Kader Elixir v1
No ratings yet
Kader Elixir v1
4 pages
ASEAN Literary Insights
No ratings yet
ASEAN Literary Insights
4 pages
Poetry Comprehension and Understanding of Text
No ratings yet
Poetry Comprehension and Understanding of Text
3 pages
Guide To ArtiosCad Cutting Table
100% (1)
Guide To ArtiosCad Cutting Table
963 pages
XML Dom
No ratings yet
XML Dom
40 pages
MTG EFAL P2 Grain of Wheat
No ratings yet
MTG EFAL P2 Grain of Wheat
112 pages
GitHub Copilot's Impact on Developer Productivity
No ratings yet
GitHub Copilot's Impact on Developer Productivity
10 pages
12th Sample Book
No ratings yet
12th Sample Book
32 pages
Walking Worthy: Christian Vocation
100% (1)
Walking Worthy: Christian Vocation
2 pages
AngularJS 1.5 Guide: Features & Implementation
No ratings yet
AngularJS 1.5 Guide: Features & Implementation
34 pages
Service Desk Specs for Bidders
No ratings yet
Service Desk Specs for Bidders
2 pages