SlideShare a Scribd company logo
2
Most read
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1702
Resume Parser with Natural Language Processing
Abdul Wahab.S1, Dr.M.N. Nachappa2
1PG student, Department of Computer Application, JAIN University, Karnataka, India
2Head of Department, Department of Computer Application, JAIN University, Karnataka, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - As a result of the online recruiting system's
progress. Candidates can effortlessly upload their resumes to
the job application website. As a result, asignificantnumberof
people are impacted. There are resumes being mailed in. The
human resource department has suffered as a result.
Recruiting new employees is a difficult task for the
department, as is filtering through a large number of
candidates. Additionally, candidatesthatsubmitresumescome
in a variety of shapes and sizes. Fonts, font sizes, colours and
other kinds of writing. For example, human resource
departments are in a difficult condition. in analyzing all of the
resumes sent by candidates and selecting the best candidate
for the job. As a result, in this project, I propose resuming the
parser using natural language. processing in order to assist
the human resources department or recruiter in gathering
precise information from the resume Natural language
processing is used to scan resume content, locate keywords,
and group them into sectors based on their relevance. based
on their keywords, and ultimately, depending on keyword
matching, give the most appropriate CV to the company. The
user must first create an account on the website and upload a
resume. The resume parser collects all necessary information
from the resume and automatically fills out a formforthe user
to proofread. The resume is saved in ourNoSQLdatabaseafter
the user confirms, and it is ready to be seen by employers. In
addition, the user receives a JSON and PDF version of their
resume.
Key Words: online recruitingsystem,mail,bestcandidate,
Scan resume, NoSQL database
1.INTRODUCTION
Corporate businesses and recruitment agenciesreviewa big
number of resumes every day. This isn't a job for humans.
There is a need for an intelligent automated system that can
extract all of the key informationfromunstructuredresumes
and convert them all to a similar structured format that can
then be ranked for a specific job position. Names, email
addresses, social media accounts,personal websites,yearsof
work experience, employment experience, years of
education, education experiences, publications, credentials,
volunteer experiences, and years of service keywords and
the CV cluster (ex: computer science, human resources, etc.)
are among the information parsed. The parsed data is
subsequently saved in a database (in this case, NoSQL) for
future use. In contrast to other types of unstructured data
(ex: email bodies, web page contents, etc.), those are the
most common file types utilized by job applicants. As a
result, to extract all of the information from unstructured
resumes and a range of data sources, an automated
intelligent system based on natural language processing is
necessary. Converting all resumes to a comparable
structured format and selecting only the information that is
relevant to screening, such as name, job, education, years of
experience, work experience, certificates, email, phone
number, and so on, is the process for parsing resumes.
Following that, the structured resume data will be parsed
and saved in a database for future use. Each set comprises
information on the person's contact information,
employment experience, and educational background.
Despite this, resumes might be difficult to decipher. This is
due to the fact that the kinds of information, the sequence in
which they are presented, and the literary style in which
they are written differently. They can also be written in a
variety of formats. ".txt", ".pdf", ".doc", ".docx", ".odt", ".rtf",
and so on are some of the most frequent. The model cannot
rely on the order or kind of data to effectively and efficiently
extract data from various types of resumes.
2. LITERATURE REVIEW
Using Text Processing as a Resume Analyzer This genuine
review outlinesanexcellent CompanyRecommenderSystem
that employs textminingand machinelearningtechniquesto
assist recruiters in selecting the best candidate for a given
position. Candidates' resumes are sorted according to the
company's needs when they upload them. The organization
can utilizes the rating to choose the best candidates. The
methods and model for this post will be given in four steps:
gathering resumes and searching for keywords in the
resume text's information base. Then, based on a rating
score, candidates are ranked and categorized. In addition,
this system may extract new keywords from resumes in
order to broaden the knowledge base. In the IT recruitment
process, information from Polish resume documents is
extracted automatically.Thisreview examinesanddiscusses
automated information retrieval fortherecruitment process
in the IT business. The suggested solution uses a multi-
module system to deal with low-resource language
dictionaries and intricate linguistic linkages in Polish. Entity
recognition is the most useful method for assessing CVs,and
it is used in this research. It's a semi-semantic text analysis
that only recognizes particular terms. It's an essential phase
in getting the text's information content ready for
processing.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1703
This project's data sets are separated into two categories.
The first is a GitHub dataset of 200 resumes that include
names (first and last name) and positions to apply for. Other
databases include global university and skills, for example.
Table -1: Number of datasets for each entity.
Entity Number of data
Name 205
Designation 473
University 829
Skills 1,249
This research uses Named Entity Recognition, a branch of
Natural Language Processing that analyses enormous
amounts of unstructured human speech.
NER extraction is the first stage in extracting information
and topic modelling. The algorithm examines the paragraph
in its entirety and highlights the text's most importantentity
elements. You can use Stanford NER or Spacy for thisproject
because the resume text is an unstructured text into
predetermined categories.
Regular expressions, as well as regular expressions in
scripts, were utilized in this project.Aregularexpression isa
string of special characters that represents a search pattern
by matching a character pattern to the search string.
A. Text conversion from PDF and DOC files
This project uses the PyMu PDF library to convert PDF files
to text format, and the python-docx library to convert Doc,
Docx files to text format.
B. Recognizing Named Entities (NER) Getting a name (both
first and last name) and a designation. This project's train
dataset is in the PKL (Pickle) format. Pickle is a Python
module that serializes objects so that they can be saved to a
file and then reloaded when the program calls them. Then,
for the training model, Named Entity Recognition (NER) is
employed because the purpose of this project is to use a
tagged dataset to locate and classify unstructured resume
material into specified categories.
C. Regular Expressions We can extract the name of the
university by using regular expressionstolook fortermslike
University, School, College, Institute, and so on in university
names. Then look for all the characters that are in the
vicinity of those keywords. Obtaining a degree or
educational backgroundbyusingregularexpressionstolook
for keywords in university names such as Bachelor of,
Master of, Doctor of, Degree, and so on. Then look for all the
characters that are in the vicinity of those keywords. The
ability to get information out of a situation. The first step is
to clean the data by removing stop words and punctuation
from the text. Stop words are a group of phrases that are
often used in a language but contain little relevant
information.
entire resume text. Then, in the talents database, look for
each token (.csv file).
Resume processing is also constrained by ethical concerns.
As a result of this approach, the only input will be text. As a
result, this method is only appropriate for screening certain
positions. A graphic designer job or any design job that
requires a visual sample of the work, animageasevidence of
work, and examination of the resume's beauty and colour.
For example, may not be suited for this approach.Thebiasof
this system appears to be leading businesses to lose staff.
3. CONCLUSION
Due to the advancement of the internet recruiting system, a
considerable number of resumes were submitted. As a
result, the human resource department or company faces a
hurdle in hiring new personnel and assessing a huge
quantity of applicants. As a result, by utilizing an automated
intelligent system based on natural languageprocessing,this
technology has aided employers. This system can
successfully convert many resume formats to text format
and retrieve certain key information. The proportion of
similarity between the applicant's résumé and the job
description can also be determined by comparing the two.
This approach can help the human resources department or
the company review resumes prior toconductinginterviews
and choosing the best candidate for the job. We were able to
convert several resume formats to text and extract crucial
information from there. We were also able to harvest terms
from several social networking sites, such as Stack Overflow
and LinkedIn, and detectsimilaritiesbetweenthem,allowing
us to define the resume's genre (e.g: Computer science,
Management, Sales, human resource, etc). Future work will
include grading resumes and analyzing information about
candidates obtained from social networking sites such as
Facebook and Twitter in order to make more accurate and
authentic decisions about whether or not to offer the
candidate a job. Our strategy is to make employers' and
candidates' jobs easier and more efficient. Our main goal is
to make the hiring process easier. The approach will offer
the companies with high-quality applications. The process's
unfair and discriminatory practices will be curtailed. The
resumes will be sorted in order based on the information
provided in the form of technical skills.
REFERENCES
[1] What is resume parsing: Retrieved from,
https://www.smartrecruiters.com/resources/glossary/resu
me- parsing/
[2] NLP Based Resume Parser using BERT in Python:
Retrieved from, https://www.pragnakalp.com/case-
study/nlp-resume-parser-bert- python/
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1704
[3] NLP based resume parser in Python (Beta): Retrieved
from, https://demos.pragnakalp.com/resume-parser/.
[4] Literature Reviews - Automated extraction of
information from Polish resume documents in the IT
recruitment process: Retrieved from,
https://www.sciencedirect.com/science/article/pii/S18770
509 .
[5] World University Ranking 2016: Retrieved from,
https://data.world/hhaveliw/world-university-ranking-
2016?fbclid=IwAR01WBDbntwc7K3NRkHpc1XCp8WcESQE
V MR2zXCXD8R31f-NTwJv1DZ7mWY

More Related Content

Similar to Resume Parser with Natural Language Processing (20)

PDF
Document Analyser Using Deep Learning
IRJET Journal
 
PDF
IRJET - Text Summarizer.
IRJET Journal
 
PDF
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Zainul Sayed
 
PDF
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET Journal
 
PDF
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
IRJET Journal
 
PPTX
IRE_Group52
Sikander Sharda
 
PDF
IRJET- Determining Document Relevance using Keyword Extraction
IRJET Journal
 
PDF
IRJET- Automatic Database Schema Generator
IRJET Journal
 
PDF
Algorithm Procedure and Pseudo Code Mining
IRJET Journal
 
PDF
IRJET - BOT Virtual Guide
IRJET Journal
 
PDF
Semantically Enriched Knowledge Extraction With Data Mining
Editor IJCATR
 
PDF
Algorithm for calculating relevance of documents in information retrieval sys...
IRJET Journal
 
PDF
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
IRJET Journal
 
PPTX
DataScience SG | Undergrad Series | 26th Sep 19
Yong Siang (Ivan) Tan
 
PDF
IRJET-Computational model for the processing of documents and support to the ...
IRJET Journal
 
PDF
IRJET- A Novel Approach – Automatic paper evaluation system
IRJET Journal
 
PDF
Multikeyword Hunt on Progressive Graphs
IRJET Journal
 
PDF
IRJET- Deep Web Searching (DWS)
IRJET Journal
 
PDF
Search Engine Scrapper
IRJET Journal
 
PDF
AUTOMATED TOOL FOR RESUME CLASSIFICATION USING SEMENTIC ANALYSIS
ijaia
 
Document Analyser Using Deep Learning
IRJET Journal
 
IRJET - Text Summarizer.
IRJET Journal
 
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Zainul Sayed
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET Journal
 
A Robust Keywords Based Document Retrieval by Utilizing Advanced Encryption S...
IRJET Journal
 
IRE_Group52
Sikander Sharda
 
IRJET- Determining Document Relevance using Keyword Extraction
IRJET Journal
 
IRJET- Automatic Database Schema Generator
IRJET Journal
 
Algorithm Procedure and Pseudo Code Mining
IRJET Journal
 
IRJET - BOT Virtual Guide
IRJET Journal
 
Semantically Enriched Knowledge Extraction With Data Mining
Editor IJCATR
 
Algorithm for calculating relevance of documents in information retrieval sys...
IRJET Journal
 
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
IRJET Journal
 
DataScience SG | Undergrad Series | 26th Sep 19
Yong Siang (Ivan) Tan
 
IRJET-Computational model for the processing of documents and support to the ...
IRJET Journal
 
IRJET- A Novel Approach – Automatic paper evaluation system
IRJET Journal
 
Multikeyword Hunt on Progressive Graphs
IRJET Journal
 
IRJET- Deep Web Searching (DWS)
IRJET Journal
 
Search Engine Scrapper
IRJET Journal
 
AUTOMATED TOOL FOR RESUME CLASSIFICATION USING SEMENTIC ANALYSIS
ijaia
 

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
PDF
Kiona – A Smart Society Automation Project
IRJET Journal
 
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
PDF
Breast Cancer Detection using Computer Vision
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Ad

Recently uploaded (20)

PPTX
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
PPTX
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
PDF
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
PPTX
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
DOCX
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PPTX
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
PDF
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
PDF
monopile foundation seminar topic for civil engineering students
Ahina5
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PPTX
Snet+Pro+Service+Software_SNET+Pro+2+Instructions.pptx
jenilsatikuvar1
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PPTX
Break Statement in Programming with 6 Real Examples
manojpoojary2004
 
PPTX
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
PPTX
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PPTX
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
DOC
MRRS Strength and Durability of Concrete
CivilMythili
 
PPTX
Thermal runway and thermal stability.pptx
godow93766
 
PDF
Book.pdf01_Intro.ppt algorithm for preperation stu used
archu26
 
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
monopile foundation seminar topic for civil engineering students
Ahina5
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
Snet+Pro+Service+Software_SNET+Pro+2+Instructions.pptx
jenilsatikuvar1
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
Break Statement in Programming with 6 Real Examples
manojpoojary2004
 
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
Design Thinking basics for Engineers.pdf
CMR University
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
MRRS Strength and Durability of Concrete
CivilMythili
 
Thermal runway and thermal stability.pptx
godow93766
 
Book.pdf01_Intro.ppt algorithm for preperation stu used
archu26
 
Ad

Resume Parser with Natural Language Processing

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1702 Resume Parser with Natural Language Processing Abdul Wahab.S1, Dr.M.N. Nachappa2 1PG student, Department of Computer Application, JAIN University, Karnataka, India 2Head of Department, Department of Computer Application, JAIN University, Karnataka, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - As a result of the online recruiting system's progress. Candidates can effortlessly upload their resumes to the job application website. As a result, asignificantnumberof people are impacted. There are resumes being mailed in. The human resource department has suffered as a result. Recruiting new employees is a difficult task for the department, as is filtering through a large number of candidates. Additionally, candidatesthatsubmitresumescome in a variety of shapes and sizes. Fonts, font sizes, colours and other kinds of writing. For example, human resource departments are in a difficult condition. in analyzing all of the resumes sent by candidates and selecting the best candidate for the job. As a result, in this project, I propose resuming the parser using natural language. processing in order to assist the human resources department or recruiter in gathering precise information from the resume Natural language processing is used to scan resume content, locate keywords, and group them into sectors based on their relevance. based on their keywords, and ultimately, depending on keyword matching, give the most appropriate CV to the company. The user must first create an account on the website and upload a resume. The resume parser collects all necessary information from the resume and automatically fills out a formforthe user to proofread. The resume is saved in ourNoSQLdatabaseafter the user confirms, and it is ready to be seen by employers. In addition, the user receives a JSON and PDF version of their resume. Key Words: online recruitingsystem,mail,bestcandidate, Scan resume, NoSQL database 1.INTRODUCTION Corporate businesses and recruitment agenciesreviewa big number of resumes every day. This isn't a job for humans. There is a need for an intelligent automated system that can extract all of the key informationfromunstructuredresumes and convert them all to a similar structured format that can then be ranked for a specific job position. Names, email addresses, social media accounts,personal websites,yearsof work experience, employment experience, years of education, education experiences, publications, credentials, volunteer experiences, and years of service keywords and the CV cluster (ex: computer science, human resources, etc.) are among the information parsed. The parsed data is subsequently saved in a database (in this case, NoSQL) for future use. In contrast to other types of unstructured data (ex: email bodies, web page contents, etc.), those are the most common file types utilized by job applicants. As a result, to extract all of the information from unstructured resumes and a range of data sources, an automated intelligent system based on natural language processing is necessary. Converting all resumes to a comparable structured format and selecting only the information that is relevant to screening, such as name, job, education, years of experience, work experience, certificates, email, phone number, and so on, is the process for parsing resumes. Following that, the structured resume data will be parsed and saved in a database for future use. Each set comprises information on the person's contact information, employment experience, and educational background. Despite this, resumes might be difficult to decipher. This is due to the fact that the kinds of information, the sequence in which they are presented, and the literary style in which they are written differently. They can also be written in a variety of formats. ".txt", ".pdf", ".doc", ".docx", ".odt", ".rtf", and so on are some of the most frequent. The model cannot rely on the order or kind of data to effectively and efficiently extract data from various types of resumes. 2. LITERATURE REVIEW Using Text Processing as a Resume Analyzer This genuine review outlinesanexcellent CompanyRecommenderSystem that employs textminingand machinelearningtechniquesto assist recruiters in selecting the best candidate for a given position. Candidates' resumes are sorted according to the company's needs when they upload them. The organization can utilizes the rating to choose the best candidates. The methods and model for this post will be given in four steps: gathering resumes and searching for keywords in the resume text's information base. Then, based on a rating score, candidates are ranked and categorized. In addition, this system may extract new keywords from resumes in order to broaden the knowledge base. In the IT recruitment process, information from Polish resume documents is extracted automatically.Thisreview examinesanddiscusses automated information retrieval fortherecruitment process in the IT business. The suggested solution uses a multi- module system to deal with low-resource language dictionaries and intricate linguistic linkages in Polish. Entity recognition is the most useful method for assessing CVs,and it is used in this research. It's a semi-semantic text analysis that only recognizes particular terms. It's an essential phase in getting the text's information content ready for processing.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1703 This project's data sets are separated into two categories. The first is a GitHub dataset of 200 resumes that include names (first and last name) and positions to apply for. Other databases include global university and skills, for example. Table -1: Number of datasets for each entity. Entity Number of data Name 205 Designation 473 University 829 Skills 1,249 This research uses Named Entity Recognition, a branch of Natural Language Processing that analyses enormous amounts of unstructured human speech. NER extraction is the first stage in extracting information and topic modelling. The algorithm examines the paragraph in its entirety and highlights the text's most importantentity elements. You can use Stanford NER or Spacy for thisproject because the resume text is an unstructured text into predetermined categories. Regular expressions, as well as regular expressions in scripts, were utilized in this project.Aregularexpression isa string of special characters that represents a search pattern by matching a character pattern to the search string. A. Text conversion from PDF and DOC files This project uses the PyMu PDF library to convert PDF files to text format, and the python-docx library to convert Doc, Docx files to text format. B. Recognizing Named Entities (NER) Getting a name (both first and last name) and a designation. This project's train dataset is in the PKL (Pickle) format. Pickle is a Python module that serializes objects so that they can be saved to a file and then reloaded when the program calls them. Then, for the training model, Named Entity Recognition (NER) is employed because the purpose of this project is to use a tagged dataset to locate and classify unstructured resume material into specified categories. C. Regular Expressions We can extract the name of the university by using regular expressionstolook fortermslike University, School, College, Institute, and so on in university names. Then look for all the characters that are in the vicinity of those keywords. Obtaining a degree or educational backgroundbyusingregularexpressionstolook for keywords in university names such as Bachelor of, Master of, Doctor of, Degree, and so on. Then look for all the characters that are in the vicinity of those keywords. The ability to get information out of a situation. The first step is to clean the data by removing stop words and punctuation from the text. Stop words are a group of phrases that are often used in a language but contain little relevant information. entire resume text. Then, in the talents database, look for each token (.csv file). Resume processing is also constrained by ethical concerns. As a result of this approach, the only input will be text. As a result, this method is only appropriate for screening certain positions. A graphic designer job or any design job that requires a visual sample of the work, animageasevidence of work, and examination of the resume's beauty and colour. For example, may not be suited for this approach.Thebiasof this system appears to be leading businesses to lose staff. 3. CONCLUSION Due to the advancement of the internet recruiting system, a considerable number of resumes were submitted. As a result, the human resource department or company faces a hurdle in hiring new personnel and assessing a huge quantity of applicants. As a result, by utilizing an automated intelligent system based on natural languageprocessing,this technology has aided employers. This system can successfully convert many resume formats to text format and retrieve certain key information. The proportion of similarity between the applicant's résumé and the job description can also be determined by comparing the two. This approach can help the human resources department or the company review resumes prior toconductinginterviews and choosing the best candidate for the job. We were able to convert several resume formats to text and extract crucial information from there. We were also able to harvest terms from several social networking sites, such as Stack Overflow and LinkedIn, and detectsimilaritiesbetweenthem,allowing us to define the resume's genre (e.g: Computer science, Management, Sales, human resource, etc). Future work will include grading resumes and analyzing information about candidates obtained from social networking sites such as Facebook and Twitter in order to make more accurate and authentic decisions about whether or not to offer the candidate a job. Our strategy is to make employers' and candidates' jobs easier and more efficient. Our main goal is to make the hiring process easier. The approach will offer the companies with high-quality applications. The process's unfair and discriminatory practices will be curtailed. The resumes will be sorted in order based on the information provided in the form of technical skills. REFERENCES [1] What is resume parsing: Retrieved from, https://www.smartrecruiters.com/resources/glossary/resu me- parsing/ [2] NLP Based Resume Parser using BERT in Python: Retrieved from, https://www.pragnakalp.com/case- study/nlp-resume-parser-bert- python/
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1704 [3] NLP based resume parser in Python (Beta): Retrieved from, https://demos.pragnakalp.com/resume-parser/. [4] Literature Reviews - Automated extraction of information from Polish resume documents in the IT recruitment process: Retrieved from, https://www.sciencedirect.com/science/article/pii/S18770 509 . [5] World University Ranking 2016: Retrieved from, https://data.world/hhaveliw/world-university-ranking- 2016?fbclid=IwAR01WBDbntwc7K3NRkHpc1XCp8WcESQE V MR2zXCXD8R31f-NTwJv1DZ7mWY