SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1203
Pseudocode to Python Translation using Machine Learning
Vinay Patil1, Rakesh Pawar2, Prasad Parab3, Prof. Satish Kuchiwale4
1,2,3Student, Computer Engineering, SIGCE, Navi Mumbai, Maharashtra, India
4Asst. Professor, Computer Engineering, Smt. Indira Gandhi College of Engineering, Navi Mumbai,
Maharashtra, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - Pseudocode is an essential conceptintheprocess
of learning algorithms and programming languages. Itcanbe
in both forms, programmatical and natural language.
Programmatical pseudocode can be easily parsed becausethe
syntax is precise and predictable but natural language
pseudocode has unpredictable and informal syntax.
Pseudocode in general is not meant to be executable and is
used as references for implementation. This system makes
pseudocode executable through providing programming
language source code. It can be helpful for students in the
learning process. Existing systems used plain neural networks
with cascade feed-forward backpropagation algorithm. The
normal implementation of backpropagation is sufficient
enough and this project improves upon the architecture by
using recurrent neural networks. This project aims at
providing a system for pseudocode to source compilation or
translation. The proposed system first decomposes the
informal statementsintoaformalintermediaterepresentation
which is in XML for faster and simple parsing. Then it will be
parsed into Python programming languages. This system will
be implemented by using RNNs with deep neural network for
sequence to sequence translation with the help of Keras
library.
Key Words: Sequence to Sequence, Translation,
Pseudocode, Machine Learning.
1. INTRODUCTION
This project is about creating anapplicationwhichtranslates
pseudocode to Python. Generally when learning about
programming languages and coding, we first learn about
different algorithms in our courses. These algorithms are
written in simple English. This simple English form or
pseudocode form is generally meant to represent the
meaning and logic of the program without any syntax or
programming language features. This makes the process of
learning about core fundamental programmingcomponents
like conditional statements, loops or concepts like recursion
easier to understand.
Giving students an option to test their algorithms by
translating the pseudocode to a programming language will
enhances the learning experience. But pseudocode is
inherently not meant to be executable. There is no standard
way of writing pseudocodes so creating a traditional
compiler or interpreter for it is not feasible. Machine
Learning techniques such as Natural Language Processing
could be used for such tasks. NLP systems are already
becoming good at solving types of tasks such as translating
languages.
The traditional compilers first perform lexical analysis,
produce an abstract syntax tree format, perform
optimizations and then producethemachinecode.Thereare
also source-to-source compilers which translate one
programming language to another. This type of compilers
have the same workflow but instead of producing machine
code, they translate it to another language. This project
follows the same paradigm of source-to-source compilers
but instead use Machine Learningtoprocessthepseudocode
in the initial step. The machine learning module will first
parse the pseudocode and generate an abstract syntax tree
format which is represent in XML form for easier parsing.
This syntax tree format is then parsed and translated
recursively, producing the final translated code.
The project is implemented in python using libraries like
Keras and NumPy. The program has two text panels, leftone
for pseudocode and right one for displaying the translated
python code. The translated code will appearintherighttext
panel after the user presses the translate button. When the
user wishes to run the program, the consolepanel isbrought
into focus and all the standard output is displayed including
input prompts.
1.1 Objective
We aim to achieve the following through this project:
 The objective is to create a program which
translates pseudocode into executable pythoncode.
 To create generic and abstract guidelines for
writing pseudocode but making itextensibleas well
for advanced users.
 To use machine learning algorithms for effective
translation of natural language statements into
expressions.
 Implement the machine learning module in Keras.
1.2 Scope
 The program will only be able to parse evaluable
and logical statements which are written in simple
plain English, instead of conceptual or vague
sentences.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1204
 The intermediate language will be translated to
python only, although it can be extended to other
languages.
2. LITERATURE SURVEY
In paper [1], the proposed system was aimed at making the
process of creating translation tools for different languages
easier with the help of conceptual meta modelling. The
system was divided in into two phases, the first phase
translated the pseudocode into intermediate form and the
second phase translated the intermediate form into source
code. The grammar for the pseudocode in different natural
languages is written in EBNF (Extended Backus-Naur Form).
The intermediate form was in XML form. The modules for
first and second phases are created such that they are
reusable e.g. the intermediatetoJavatranslatorcouldbeused
in any system where Java translation is required, same for
French Pseudocode to intermediate translator.
The paper [2] was referenced by the previouspaperand acts
as a base for it. In this paper 3 algorithms were tested, Back
propagation, Cascade-feed forward backpropagation and
Radial basis function algorithm. Cascade-forward back
propagation neural network is similar to backpropagation
neural network except its input layer is connectedtotherest
of the layers. In Radial basis function algorithm, the hidden
layer applies the radial basis function over the values from
the input layer. The paper also had predefined syntaxforthe
pseudocode on which the neural network was trained on.
The pseudocode was first split intokeywordsanda matrixof
binary numbers was generated based on the index of the
keyword from the list of vocabulary, also called as one hot
encoding. This matrix was then passed onto the 3 neural
networks and the results were then compared. The paper
concluded that cascade-feed forward backpropagation gave
the best results out of the three.
In paper [3] the pseudocode was represented in the form of
XML. The operations were represented using tags in XML.
The translation process was done using regular expressions
and pattern matching. The pseudo code written in XML use
regular expressions and pattern matching to translate this
XML pseudo code into C and Java programs. Each tag is
searched using regular expression and the appropriate
operation is done upon the contents of the tag.
The paper [4] proposes a system for sequence to sequence
mapping using recurrent neural networks. The usual deep
neural networks cannot map between sequences to
sequences. The paper presents an end-to-end approach for
the mapping of sequences. The system consists of twomajor
components, an encoder and decoder. The encoder consists
of multilayered LSTM cells whichconverttheinputsequence
into a context vector. This context vector is then passedonto
the decoder which also has deep LSTM cells to generate the
target sequence. The paper also compared a phrase based
Statistical Machine Translation with LSTM and concluded
that LSTM performed better in terms of BLEU score and
performance.
3. PROBLEM STATEMENT
Pseudocode is generally meant for learning, writing
algorithms or prototyping. Students learn to write
pseudocode alongside with programming languages. The
learning process can be enhanced by having source code
version of the pseudocode. The pseudocode written in
natural language can be difficult to quantify and parse into
logical form. This project provides a system to convert
pseudocode into Python. The system will use machine
learning to parse the natural language statements. The
natural language statements will be translated into an
intermediate representation and this intermediate
representation will be then converted into Python.
4. SYSTEM OVERVIEW
4.1.1 Dataset
The dataset used in this project consist of list of natural
language statements with their translations. Each pairin the
dataset is created such that the system will beabletopredict
the action performed and also detect the position of the
variables in the sentence.
E.g. let c be the sum of a and b t assign 1 add 6 8
4.1.2 Preprocessing
Every letter in the pseudocode is lower cased. Parsing of the
expressions through ML is a difficult task so the expressions
are hidden away from the ML model. The whitespace from
the expressions is removed so that they appear as a single
entity. Some noised variations of the sentences are also
generated to make the prediction of the positions of the
variables more generalized. The noised variations are
generated by adding variable number of paddings in the
sentences which helps the system generalize over the input.
4.1.3 Pseudocode to Intermediate translation
Pseudocode to intermediate translation is handled using a
machine learning technique called natural language
processing. Recurrent neural network will be used as they
are much more efficient and accurate than ANNs and CNNs.
LSTM cells are used for the RNN layers. The model loosely
follows the sequence to sequence paradigm. In sequence to
sequence paradigm the model has two parts encoder and
decoder. The encoder takes input and generates a context
vector which is then used by the decoder to generate the
output sequence [4]. This project attempts to mimic the
attention mechanism by first flattening the outputs of the
encoder layer and feeding the outputtoeachoftheLSTM cell
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1205
in the decoder layer [5][6]. The one hot encodings are
grouped together to form the input matrix. Outputs of the
model will be a shorthand version of the XML form which
consists of generic function names and variable positions in
the original sentence. The Kerasmodel isimplementedusing
the Sequential model. Inputs to the model will be first
converted to one hot form.
Steps:
1. Convert the sentence in an array of words.
2. Covert each word into their index number in the
vocabulary, if not exists then put unknown or skip.
3. Convert this array of numbers into one hot encoding.
4. Create the input matrix by grouping the one hot
encodings.
5. Feed the input matrix into the network.
6. Network will output the prediction.
4.1.4 Intermediate to Python translation
The output of the previous module is a shorthand version of
the XML form. This shorthand form is first converted to the
XML form for easier traversing. This XML form is essentially
the syntax tree of the pseudocode. The translation in this
module is done via regular expressions and recursion. Each
tag is visited and replaced with its translation recursively
until the final translation is formed.
Steps:
1. Parse all the variables in the intermediate form using
regular expressions.
2. Check if variable names clashes with python keywords.
3. Visit every node in the XML syntax tree recursively.
4. Start replacing each pattern with the appropriate
python code using regular expression.
4.2. Flowchart
4.3. Result
4.3.1 Screenshots
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1206
4.3.2 Graphs
Training and test accuracy:
Training and test loss:
6. CONCLUSIONS
This project will be beneficial for students in learning
algorithms and programming languages like python and
study how the logic is being implemented in the actual code.
In this proposed system we present a program which is a
pseudocode to python translator and will act as an helping
aid in the learning process.Thepseudocodeisnotexecutable
so the proposed system will enable the students to test and
execute their algorithms.
ACKNOWLEDGEMENT
It is pleasant task to express gratitude to all those who
contributed in many ways to this project and made it an
unforgettable experience for us. First of all, we would like to
thank our guide Prof. Satish Kuchiwale .Thiswork would not
have been possible without her guidance, support and
encouragement. We are highly indebted to Dr.Sunil Chavan,
Principal, Smt. Indira Gandhi College of Engineering
Mrs.Sonali Deshpande , Head of the Department, Computer
Department and also all the faculty members. We are
thankful to lord for keeping us energized that every end
seems to be new beginning.
REFERENCES
[1] T. Dirgahayu and S.N. Huda " IEEE: Automatic
Translation from Pseudocode to Source Code: A
Conceptual-Metamodel Approach” IEEE, 2017.
[2] S.O.Hasson and F.M.R. Younis “Automatic Pseudocode to
Source Code Translation Using Neural Network
Technique”, Intl. J. Engineering and Innovative
Technology (IJEIT),vol. 3, no. 11, 2014.
[3] S. Mukherjee and T. Chakrabarti, “Automatic algorithm
specification to source code translation”, Indian J.
Computer Science and Engineering (IJCSE), vol. 2, no. 2,
2011.
[4] Ilya Sutskever, Oriol Vinyals and Quoc V. Le.2014
“Sequence to SequenceLearning withNeuralNetworks”,
NIPS.
[5] Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio
2015 “Neural machine translation by jointly learning to
align and translate”, ICLR.
[6] Minh-Thang Luong, Hieu Pham, and Chrisopher D
Manning. 2015 “Effectiveapproachestoattention-based
neural machine translation”, EMNLP.
BIOGRAPHIES
Vinay Ganesh Patil, Pursuing the
Bachelordegree(B.E.)inComputer
Engineering from Smt. Indira
Gandhi College of Engineering
(SIGCE), Navi Mumbai. His current
researchinterestsincludeGraphics
& Machine Learning.
Rakesh Janu Pawar, Pursuing the
Bachelordegree(B.E.)inComputer
Engineering from Smt. Indira
Gandhi College of Engineering
(SIGCE), Navi Mumbai. His current
research interests include Web
Development & Machine Learning.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1207
Prasad Sunil Parab, Pursuing the
Bachelordegree(B.E.)inComputer
Engineering from Smt. Indira
Gandhi College of Engineering
(SIGCE), Navi Mumbai. His current
research interests include Web
Designing & Machine Learning
Prof. Satish Lalasaheb Kuchiwale,
Obtained the Bachelor degree(B.E.
IT) in the year 2007 from
Rajarambapu Institute of
Technology (RIT), Rajaramnagar,
Sakharale, andMasterdegree(M.E.
Computer) from Lokamanya Tilak
College ofEngineering(LTCE), Navi
Mumbai. He is Asst. Professor in
Smt. Indira Gandhi College of
Engineering of Mumbai university
and having about 12 yrs. of
experience.

More Related Content

What's hot (15)

PDF
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
IJET - International Journal of Engineering and Techniques
 
PDF
OOPS_Unit_1
Shipra Swati
 
PDF
Summarization Techniques for Code, Changes, and Testing
Sebastiano Panichella
 
PDF
Multi step automated refactoring for code smell
eSAT Journals
 
PDF
Multi step automated refactoring for code smell
eSAT Publishing House
 
PDF
DOMAIN BASED CHUNKING
kevig
 
PDF
An NLP-based architecture for the autocompletion of partial domain models
Lola Burgueño
 
PDF
DeepPavlov 2019
Mikhail Burtsev
 
PDF
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
ijistjournal
 
PDF
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
kevig
 
PDF
IRJET - Text Summarizer.
IRJET Journal
 
PDF
BERT - Part 1 Learning Notes of Senthil Kumar
Senthil Kumar M
 
PDF
Survey on Indian CLIR and MT systems in Marathi Language
Editor IJCATR
 
PDF
IRJET - Automated Essay Grading System using Deep Learning
IRJET Journal
 
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
IJET - International Journal of Engineering and Techniques
 
OOPS_Unit_1
Shipra Swati
 
Summarization Techniques for Code, Changes, and Testing
Sebastiano Panichella
 
Multi step automated refactoring for code smell
eSAT Journals
 
Multi step automated refactoring for code smell
eSAT Publishing House
 
DOMAIN BASED CHUNKING
kevig
 
An NLP-based architecture for the autocompletion of partial domain models
Lola Burgueño
 
DeepPavlov 2019
Mikhail Burtsev
 
PSEUDOCODE TO SOURCE PROGRAMMING LANGUAGE TRANSLATOR
ijistjournal
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
kevig
 
IRJET - Text Summarizer.
IRJET Journal
 
BERT - Part 1 Learning Notes of Senthil Kumar
Senthil Kumar M
 
Survey on Indian CLIR and MT systems in Marathi Language
Editor IJCATR
 
IRJET - Automated Essay Grading System using Deep Learning
IRJET Journal
 

Similar to IRJET - Pseudocode to Python Translation using Machine Learning (20)

PDF
Recent Trends in Translation of Programming Languages using NLP Approaches
IRJET Journal
 
PDF
An Efficient Approach to Produce Source Code by Interpreting Algorithm
IRJET Journal
 
PDF
Overlapping optimization with parsing through metagrammars
IAEME Publication
 
PDF
Speech To Speech Translation
IRJET Journal
 
PDF
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
IRJET Journal
 
PDF
IRJET- Voice to Code Editor using Speech Recognition
IRJET Journal
 
PDF
Aq4301224227
IJERA Editor
 
PDF
Deepcoder to Self-Code with Machine Learning
IRJET Journal
 
PDF
IRJET- Factoid Question and Answering System
IRJET Journal
 
PDF
A study on the techniques for speech to speech translation
IRJET Journal
 
PDF
D017232729
IOSR Journals
 
PDF
IRJET - Voice based Natural Language Query Processing
IRJET Journal
 
PDF
NEURAL NETWORK BOT
IRJET Journal
 
PDF
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
IRJET Journal
 
PDF
IRJET - Storytelling App for Children with Hearing Impairment using Natur...
IRJET Journal
 
PDF
IRJET- QUEZARD : Question Wizard using Machine Learning and Artificial Intell...
IRJET Journal
 
PPTX
PCCF UNIT 2 CLASS.pptx
vishnupriyapm4
 
PDF
Performance Comparison between Pytorch and Mindspore
IJDMS
 
PDF
IRJET - Mobile Chatbot for Information Search
IRJET Journal
 
PDF
IRJET- Lost: The Horror Game
IRJET Journal
 
Recent Trends in Translation of Programming Languages using NLP Approaches
IRJET Journal
 
An Efficient Approach to Produce Source Code by Interpreting Algorithm
IRJET Journal
 
Overlapping optimization with parsing through metagrammars
IAEME Publication
 
Speech To Speech Translation
IRJET Journal
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
IRJET Journal
 
IRJET- Voice to Code Editor using Speech Recognition
IRJET Journal
 
Aq4301224227
IJERA Editor
 
Deepcoder to Self-Code with Machine Learning
IRJET Journal
 
IRJET- Factoid Question and Answering System
IRJET Journal
 
A study on the techniques for speech to speech translation
IRJET Journal
 
D017232729
IOSR Journals
 
IRJET - Voice based Natural Language Query Processing
IRJET Journal
 
NEURAL NETWORK BOT
IRJET Journal
 
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
IRJET Journal
 
IRJET - Storytelling App for Children with Hearing Impairment using Natur...
IRJET Journal
 
IRJET- QUEZARD : Question Wizard using Machine Learning and Artificial Intell...
IRJET Journal
 
PCCF UNIT 2 CLASS.pptx
vishnupriyapm4
 
Performance Comparison between Pytorch and Mindspore
IJDMS
 
IRJET - Mobile Chatbot for Information Search
IRJET Journal
 
IRJET- Lost: The Horror Game
IRJET Journal
 
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
PDF
Kiona – A Smart Society Automation Project
IRJET Journal
 
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
PDF
Breast Cancer Detection using Computer Vision
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Ad

Recently uploaded (20)

PPTX
Benefits_^0_Challigi😙🏡💐8fenges[1].pptx
akghostmaker
 
PPTX
NEUROMOROPHIC nu iajwojeieheueueueu.pptx
knkoodalingam39
 
PDF
POWER PLANT ENGINEERING (R17A0326).pdf..
haneefachosa123
 
PPTX
Pharmaceuticals and fine chemicals.pptxx
jaypa242004
 
PDF
ARC--BUILDING-UTILITIES-2-PART-2 (1).pdf
IzzyBaniquedBusto
 
PPTX
UNIT DAA PPT cover all topics 2021 regulation
archu26
 
PPTX
ISO/IEC JTC 1/WG 9 (MAR) Convenor Report
Kurata Takeshi
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PPTX
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
PPTX
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
PPTX
Green Building & Energy Conservation ppt
Sagar Sarangi
 
PPTX
Innowell Capability B0425 - Commercial Buildings.pptx
regobertroza
 
PDF
A presentation on the Urban Heat Island Effect
studyfor7hrs
 
PPTX
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
PDF
Book.pdf01_Intro.ppt algorithm for preperation stu used
archu26
 
PDF
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
PDF
UNIT-4-FEEDBACK AMPLIFIERS AND OSCILLATORS (1).pdf
Sridhar191373
 
PDF
MOBILE AND WEB BASED REMOTE BUSINESS MONITORING SYSTEM
ijait
 
PDF
Water Design_Manual_2005. KENYA FOR WASTER SUPPLY AND SEWERAGE
DancanNgutuku
 
Benefits_^0_Challigi😙🏡💐8fenges[1].pptx
akghostmaker
 
NEUROMOROPHIC nu iajwojeieheueueueu.pptx
knkoodalingam39
 
POWER PLANT ENGINEERING (R17A0326).pdf..
haneefachosa123
 
Pharmaceuticals and fine chemicals.pptxx
jaypa242004
 
ARC--BUILDING-UTILITIES-2-PART-2 (1).pdf
IzzyBaniquedBusto
 
UNIT DAA PPT cover all topics 2021 regulation
archu26
 
ISO/IEC JTC 1/WG 9 (MAR) Convenor Report
Kurata Takeshi
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
Green Building & Energy Conservation ppt
Sagar Sarangi
 
Innowell Capability B0425 - Commercial Buildings.pptx
regobertroza
 
A presentation on the Urban Heat Island Effect
studyfor7hrs
 
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
Book.pdf01_Intro.ppt algorithm for preperation stu used
archu26
 
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
UNIT-4-FEEDBACK AMPLIFIERS AND OSCILLATORS (1).pdf
Sridhar191373
 
MOBILE AND WEB BASED REMOTE BUSINESS MONITORING SYSTEM
ijait
 
Water Design_Manual_2005. KENYA FOR WASTER SUPPLY AND SEWERAGE
DancanNgutuku
 

IRJET - Pseudocode to Python Translation using Machine Learning

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1203 Pseudocode to Python Translation using Machine Learning Vinay Patil1, Rakesh Pawar2, Prasad Parab3, Prof. Satish Kuchiwale4 1,2,3Student, Computer Engineering, SIGCE, Navi Mumbai, Maharashtra, India 4Asst. Professor, Computer Engineering, Smt. Indira Gandhi College of Engineering, Navi Mumbai, Maharashtra, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - Pseudocode is an essential conceptintheprocess of learning algorithms and programming languages. Itcanbe in both forms, programmatical and natural language. Programmatical pseudocode can be easily parsed becausethe syntax is precise and predictable but natural language pseudocode has unpredictable and informal syntax. Pseudocode in general is not meant to be executable and is used as references for implementation. This system makes pseudocode executable through providing programming language source code. It can be helpful for students in the learning process. Existing systems used plain neural networks with cascade feed-forward backpropagation algorithm. The normal implementation of backpropagation is sufficient enough and this project improves upon the architecture by using recurrent neural networks. This project aims at providing a system for pseudocode to source compilation or translation. The proposed system first decomposes the informal statementsintoaformalintermediaterepresentation which is in XML for faster and simple parsing. Then it will be parsed into Python programming languages. This system will be implemented by using RNNs with deep neural network for sequence to sequence translation with the help of Keras library. Key Words: Sequence to Sequence, Translation, Pseudocode, Machine Learning. 1. INTRODUCTION This project is about creating anapplicationwhichtranslates pseudocode to Python. Generally when learning about programming languages and coding, we first learn about different algorithms in our courses. These algorithms are written in simple English. This simple English form or pseudocode form is generally meant to represent the meaning and logic of the program without any syntax or programming language features. This makes the process of learning about core fundamental programmingcomponents like conditional statements, loops or concepts like recursion easier to understand. Giving students an option to test their algorithms by translating the pseudocode to a programming language will enhances the learning experience. But pseudocode is inherently not meant to be executable. There is no standard way of writing pseudocodes so creating a traditional compiler or interpreter for it is not feasible. Machine Learning techniques such as Natural Language Processing could be used for such tasks. NLP systems are already becoming good at solving types of tasks such as translating languages. The traditional compilers first perform lexical analysis, produce an abstract syntax tree format, perform optimizations and then producethemachinecode.Thereare also source-to-source compilers which translate one programming language to another. This type of compilers have the same workflow but instead of producing machine code, they translate it to another language. This project follows the same paradigm of source-to-source compilers but instead use Machine Learningtoprocessthepseudocode in the initial step. The machine learning module will first parse the pseudocode and generate an abstract syntax tree format which is represent in XML form for easier parsing. This syntax tree format is then parsed and translated recursively, producing the final translated code. The project is implemented in python using libraries like Keras and NumPy. The program has two text panels, leftone for pseudocode and right one for displaying the translated python code. The translated code will appearintherighttext panel after the user presses the translate button. When the user wishes to run the program, the consolepanel isbrought into focus and all the standard output is displayed including input prompts. 1.1 Objective We aim to achieve the following through this project:  The objective is to create a program which translates pseudocode into executable pythoncode.  To create generic and abstract guidelines for writing pseudocode but making itextensibleas well for advanced users.  To use machine learning algorithms for effective translation of natural language statements into expressions.  Implement the machine learning module in Keras. 1.2 Scope  The program will only be able to parse evaluable and logical statements which are written in simple plain English, instead of conceptual or vague sentences.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1204  The intermediate language will be translated to python only, although it can be extended to other languages. 2. LITERATURE SURVEY In paper [1], the proposed system was aimed at making the process of creating translation tools for different languages easier with the help of conceptual meta modelling. The system was divided in into two phases, the first phase translated the pseudocode into intermediate form and the second phase translated the intermediate form into source code. The grammar for the pseudocode in different natural languages is written in EBNF (Extended Backus-Naur Form). The intermediate form was in XML form. The modules for first and second phases are created such that they are reusable e.g. the intermediatetoJavatranslatorcouldbeused in any system where Java translation is required, same for French Pseudocode to intermediate translator. The paper [2] was referenced by the previouspaperand acts as a base for it. In this paper 3 algorithms were tested, Back propagation, Cascade-feed forward backpropagation and Radial basis function algorithm. Cascade-forward back propagation neural network is similar to backpropagation neural network except its input layer is connectedtotherest of the layers. In Radial basis function algorithm, the hidden layer applies the radial basis function over the values from the input layer. The paper also had predefined syntaxforthe pseudocode on which the neural network was trained on. The pseudocode was first split intokeywordsanda matrixof binary numbers was generated based on the index of the keyword from the list of vocabulary, also called as one hot encoding. This matrix was then passed onto the 3 neural networks and the results were then compared. The paper concluded that cascade-feed forward backpropagation gave the best results out of the three. In paper [3] the pseudocode was represented in the form of XML. The operations were represented using tags in XML. The translation process was done using regular expressions and pattern matching. The pseudo code written in XML use regular expressions and pattern matching to translate this XML pseudo code into C and Java programs. Each tag is searched using regular expression and the appropriate operation is done upon the contents of the tag. The paper [4] proposes a system for sequence to sequence mapping using recurrent neural networks. The usual deep neural networks cannot map between sequences to sequences. The paper presents an end-to-end approach for the mapping of sequences. The system consists of twomajor components, an encoder and decoder. The encoder consists of multilayered LSTM cells whichconverttheinputsequence into a context vector. This context vector is then passedonto the decoder which also has deep LSTM cells to generate the target sequence. The paper also compared a phrase based Statistical Machine Translation with LSTM and concluded that LSTM performed better in terms of BLEU score and performance. 3. PROBLEM STATEMENT Pseudocode is generally meant for learning, writing algorithms or prototyping. Students learn to write pseudocode alongside with programming languages. The learning process can be enhanced by having source code version of the pseudocode. The pseudocode written in natural language can be difficult to quantify and parse into logical form. This project provides a system to convert pseudocode into Python. The system will use machine learning to parse the natural language statements. The natural language statements will be translated into an intermediate representation and this intermediate representation will be then converted into Python. 4. SYSTEM OVERVIEW 4.1.1 Dataset The dataset used in this project consist of list of natural language statements with their translations. Each pairin the dataset is created such that the system will beabletopredict the action performed and also detect the position of the variables in the sentence. E.g. let c be the sum of a and b t assign 1 add 6 8 4.1.2 Preprocessing Every letter in the pseudocode is lower cased. Parsing of the expressions through ML is a difficult task so the expressions are hidden away from the ML model. The whitespace from the expressions is removed so that they appear as a single entity. Some noised variations of the sentences are also generated to make the prediction of the positions of the variables more generalized. The noised variations are generated by adding variable number of paddings in the sentences which helps the system generalize over the input. 4.1.3 Pseudocode to Intermediate translation Pseudocode to intermediate translation is handled using a machine learning technique called natural language processing. Recurrent neural network will be used as they are much more efficient and accurate than ANNs and CNNs. LSTM cells are used for the RNN layers. The model loosely follows the sequence to sequence paradigm. In sequence to sequence paradigm the model has two parts encoder and decoder. The encoder takes input and generates a context vector which is then used by the decoder to generate the output sequence [4]. This project attempts to mimic the attention mechanism by first flattening the outputs of the encoder layer and feeding the outputtoeachoftheLSTM cell
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1205 in the decoder layer [5][6]. The one hot encodings are grouped together to form the input matrix. Outputs of the model will be a shorthand version of the XML form which consists of generic function names and variable positions in the original sentence. The Kerasmodel isimplementedusing the Sequential model. Inputs to the model will be first converted to one hot form. Steps: 1. Convert the sentence in an array of words. 2. Covert each word into their index number in the vocabulary, if not exists then put unknown or skip. 3. Convert this array of numbers into one hot encoding. 4. Create the input matrix by grouping the one hot encodings. 5. Feed the input matrix into the network. 6. Network will output the prediction. 4.1.4 Intermediate to Python translation The output of the previous module is a shorthand version of the XML form. This shorthand form is first converted to the XML form for easier traversing. This XML form is essentially the syntax tree of the pseudocode. The translation in this module is done via regular expressions and recursion. Each tag is visited and replaced with its translation recursively until the final translation is formed. Steps: 1. Parse all the variables in the intermediate form using regular expressions. 2. Check if variable names clashes with python keywords. 3. Visit every node in the XML syntax tree recursively. 4. Start replacing each pattern with the appropriate python code using regular expression. 4.2. Flowchart 4.3. Result 4.3.1 Screenshots
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1206 4.3.2 Graphs Training and test accuracy: Training and test loss: 6. CONCLUSIONS This project will be beneficial for students in learning algorithms and programming languages like python and study how the logic is being implemented in the actual code. In this proposed system we present a program which is a pseudocode to python translator and will act as an helping aid in the learning process.Thepseudocodeisnotexecutable so the proposed system will enable the students to test and execute their algorithms. ACKNOWLEDGEMENT It is pleasant task to express gratitude to all those who contributed in many ways to this project and made it an unforgettable experience for us. First of all, we would like to thank our guide Prof. Satish Kuchiwale .Thiswork would not have been possible without her guidance, support and encouragement. We are highly indebted to Dr.Sunil Chavan, Principal, Smt. Indira Gandhi College of Engineering Mrs.Sonali Deshpande , Head of the Department, Computer Department and also all the faculty members. We are thankful to lord for keeping us energized that every end seems to be new beginning. REFERENCES [1] T. Dirgahayu and S.N. Huda " IEEE: Automatic Translation from Pseudocode to Source Code: A Conceptual-Metamodel Approach” IEEE, 2017. [2] S.O.Hasson and F.M.R. Younis “Automatic Pseudocode to Source Code Translation Using Neural Network Technique”, Intl. J. Engineering and Innovative Technology (IJEIT),vol. 3, no. 11, 2014. [3] S. Mukherjee and T. Chakrabarti, “Automatic algorithm specification to source code translation”, Indian J. Computer Science and Engineering (IJCSE), vol. 2, no. 2, 2011. [4] Ilya Sutskever, Oriol Vinyals and Quoc V. Le.2014 “Sequence to SequenceLearning withNeuralNetworks”, NIPS. [5] Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio 2015 “Neural machine translation by jointly learning to align and translate”, ICLR. [6] Minh-Thang Luong, Hieu Pham, and Chrisopher D Manning. 2015 “Effectiveapproachestoattention-based neural machine translation”, EMNLP. BIOGRAPHIES Vinay Ganesh Patil, Pursuing the Bachelordegree(B.E.)inComputer Engineering from Smt. Indira Gandhi College of Engineering (SIGCE), Navi Mumbai. His current researchinterestsincludeGraphics & Machine Learning. Rakesh Janu Pawar, Pursuing the Bachelordegree(B.E.)inComputer Engineering from Smt. Indira Gandhi College of Engineering (SIGCE), Navi Mumbai. His current research interests include Web Development & Machine Learning.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1207 Prasad Sunil Parab, Pursuing the Bachelordegree(B.E.)inComputer Engineering from Smt. Indira Gandhi College of Engineering (SIGCE), Navi Mumbai. His current research interests include Web Designing & Machine Learning Prof. Satish Lalasaheb Kuchiwale, Obtained the Bachelor degree(B.E. IT) in the year 2007 from Rajarambapu Institute of Technology (RIT), Rajaramnagar, Sakharale, andMasterdegree(M.E. Computer) from Lokamanya Tilak College ofEngineering(LTCE), Navi Mumbai. He is Asst. Professor in Smt. Indira Gandhi College of Engineering of Mumbai university and having about 12 yrs. of experience.