Enhancing code summarization with graph embedding and pre-trained model

L Li, J Li, Y Xu, H Zhu, X Zhang - International Journal of Software …, 2023 - World Scientific
L Li, J Li, Y Xu, H Zhu, X Zhang
International Journal of Software Engineering and Knowledge Engineering, 2023World Scientific
Code summarization is a task that aims at automatically producing descriptions of source
code. Recently many deep-learning-based approaches have been proposed to generate
accurate code summaries, among which pre-trained models (PTMs) for programming
languages have achieved promising results. It is well known that source code written in
programming languages is highly structured and unambiguous. Though previous work pre-
trained the model with well-design tasks to learn universal representation from a large scale …
Code summarization is a task that aims at automatically producing descriptions of source code. Recently many deep-learning-based approaches have been proposed to generate accurate code summaries, among which pre-trained models (PTMs) for programming languages have achieved promising results. It is well known that source code written in programming languages is highly structured and unambiguous. Though previous work pre-trained the model with well-design tasks to learn universal representation from a large scale of data, they have not considered structure information during the fine-tuning stage. To make full use of both the pre-trained programming language model and the structure information of source code, we utilize Flow-Augmented Abstract Syntax Tree (FA-AST) of source code for structure information and propose GraphPLBART — Graph-augmented Programming Language and Bi-directional Auto-Regressive Transformer, which can effectively introduce structure information to a well PTM through a cross attention layer. Compared with the best-performing baselines, GraphPLBART still improves by 3.2%, 7.1%, and 1.2% in terms of BLEU, METEOR, and ROUGE-L, respectively, on Java dataset, and also improves by 4.0%, 6.3%, and 2.1% on Python dataset. Further experiment shows that the structure information from FA-AST has significant benefits for the performance of GraphPLBART. In addition, our meticulous manual evaluation experiment further reinforces the superiority of our proposed approach. This demonstrates its remarkable abstract quality and solidifies its position as a promising solution in the field of code summarization.
World Scientific
Showing the best result for this search. See all results