0% found this document useful (0 votes)
268 views4 pages

A Search of Verilog Code Plagiarism Detection Method: Lisheng Wang, Lingchao Jiang and Guofeng Qin

The document proposes a method to detect plagiarism in Verilog codes for CPU design experiments. It first verifies code correctness through automatic simulation, then filters codes using the MOSS plagiarism detection system and abstract syntax tree comparisons to identify similarity both in text and structure.

Uploaded by

sach
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
268 views4 pages

A Search of Verilog Code Plagiarism Detection Method: Lisheng Wang, Lingchao Jiang and Guofeng Qin

The document proposes a method to detect plagiarism in Verilog codes for CPU design experiments. It first verifies code correctness through automatic simulation, then filters codes using the MOSS plagiarism detection system and abstract syntax tree comparisons to identify similarity both in text and structure.

Uploaded by

sach
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

The 13th International Conference on

Computer Science & Education (ICCSE 2018)


August 8-11, 2018. Colombo, Sri Lanka FriB1.5

A Search of Verilog Code Plagiarism Detection


Method
Lisheng Wang1,2 , Lingchao Jiang1 and Guofeng Qin1,2,∗
1
College of Electronics and Information Engineering, Tongji University, Shanghai, China
2
National Computer and Information Technology Experiment Teaching Demonstration Center,
Tongji University, Shanghai, China
*[email protected]

Abstract—In order to detect the plagiarism in Verilog codes of based on attribute-oriented detection [3]. In order to obtain
CPU design experiment, the existing code detection technologies better detection results, a code plagiarism detection method
are studied, and a Verilog code plagiarism detection combining based on program structure metrics was also generated. In
the MOSS system and abstract grammar tree model (AST) is pro-
posed. This method first verifies the Verilog code’s executable and this method, the structure of program was symbolized, and
correctness, then filters the suspected plagiarism code through then serial comparison was performed to calculate similarity
the MOSS system, then filters the suspected plagiarism code by [4]. This method is now used wildly by plagiarism detection
the AST-based code detection method, the sum of the two filtered system. For example, Stanford University designed the MOSS
files is the final result. This method can detect both similarity system using document fingerprinting and winnowing algo-
of texts and similarity of structure. And the latter can make
progress for detecting the program plagiarism that changes the rithm [5]; Wichita State University designed the SIM system
code structure. [6] using an algorithm that detects the similarity of DNA
Index Terms– Verilog codes, plagiarism detection, MOSS, ab- sequences; Karlsruhe University designed the JPlag system [7]
stract grammar tree. using the GST algorithm that calculates the similarity of the
code. However, only the MOSS system can detect Verilog code
I. I NTRODUCTION
plagiarism.
Since hardware courses are difficult to learn, and the
cost of hardware experiments is high, hardware experiments II. DETECT VERILOG CODE PLAGIARISM WITH
have always been poor in teaching. In recent years, many MOSS
universities have realized the lack in hardware teaching and
have carried out some experiment teaching reforms. The In general, code plagiarism can be divided into five ways:
experiment teaching reform of hardware course involves three Complete copy; Modify comments; Replace names of vari-
courses: Digital Logic, Computer Composition Principles and ables or functions; Adjust the position of code; Plagiarism
Computer System Structure. Students need to complete the based on program semantics.
corresponding experiment while learning textbook knowledge. This article has done a large number of tests with the
For the Computer Composition Principles, students need to MOSS system, and found that it can detect first four kinds
complete the design of CPU based on the MIPS32 instruction of plagiarism behavior accurately, but when it comes to the
set on Xilinx’s Nexys4 development board, using Verilog. last kind, the MOSS system is useless. Because it is designed
Some studies have found that nearly 33% of students by winnowing algorithm which uses metric information to
admitted that they had copied code [1] in some degrees, and determine similarity, and is lack of analysis about data flow
the situation has become increasingly fierce [2]. We also found and control flow [8]. And the similarity percent will be effected
plagiarism when we checked the Verilog codes. In order to by the size of the Verilog code files, which means the result
increase the effectiveness of the experiments, we need to take may not be credible. For these reasons, this article attempts
some measures to deal with code plagiarism. In the very to optimize the MOSS system to get a more efficient Verilog
beginning, we first told students to run the CPU, designed code plagiarism detection method.
by themselves, on the spot; Then we asked students about the
details of the implementation of the CPU design; At last, we III. PLAGIARISM DETECTION METHOD DESIGN
got final grade. However, there are about 140 students in each
grade, it’s not easy to check all designs one by one. The Verilog code plagiarism detection method designed
In recent years, many scholars have done a lot of research in this paper first verifies the correctness of the code by
on code plagiarism detection technology. They proposed many automatically simulating the project; Then it uses the MOSS
different methods to analyze the code, and developed some system to filter out part of plagiarism codes; After that, it
effective code plagiarism detection systems. Code plagiarism converts the codes into abstract syntax tree to calculate the
detection techniques are roughly divided into two categories: similarity and filters out the rest part of plagiarism codes. The
techniques based on code-text comparisons and techniques process is shown in Fig. 1.

978-1-5386-5495-8/18/$31.00 ©2018 IEEE 752


FriB1.5

3DFNDJHV
FRQWDLQLQJ9HULORJ
Verilog code files FRGH
to be detected

%URZVHDOOWKH
9HULORJFRGHILOHV
WRJHQHUDWHDOLVW
Automatic test ILOH
program
correctness $XWRPDWLFVLPXODWH
7HVWEHQFKILOHV
WKH9HULORJ
SURJUDP

Detect Verilog Rest Verilog code


code files using files to be detected 6LPXODWHUHVXOW
MOSS system ILOHV

Plagiarism code Detect Verilog &RPSDUHWZRILOHV )LOHVZLWKFRUUHFW


files detected by code files using DQGUHWXUQWKH UHVXOW
MOSS AST-based code UHVXOW
detection method
Fig. 2. Automatic simulation module.

Plagiarism code
All of plagiarism files detected by
code files AST-based code number of source codes. If students plagiarize online codes,
detection method repetition rate of their codes will be high, so there is no need
to specifically search online source code for comparison. The
final scope of the detection is divided into two parts: Compare
Fig. 1. The structure of Verilog code plagiarism detection method. students’ code in pairs; Compare the experimental codes with
what submitted by previous students.
Students usually writes different modules in different files.
A. Verify the Verilog code If package the code files and push them to the MOSS system,
The Verilog code plagiarism detection method proposed in the packages will be compared in pairs. In this way, the
this paper is based on the fact that these codes are executed MOSS system has to consume a lot of time to switch from
correctly. If the Verilog code files submitted by students cant one file to another. So, before comparison, it’s better to
execute correctly, code plagiarism detection will be completely integrate the codes of each student into one file with the
meaningless. So this article designs a module to automatically student number. When submitting the files, the user needs to
verify whether Verilog code can be executed correctly. use the instruction provided by MOSS to invoke the script
The module, verifying the correctness of Verilog code, file named moss.pl. The format of instruction is as follows:
is based on the simulation capability of ModelSim. In the moss [−l language] [−d] [−b basef ile1] ... [−b basef ilen]
experiment instruction for students, the interface of each [−m #] [−c ”string”] f ile1 f ile2 f ile3 ... When the number
module is standardized. Students should write code refer to of files is large, the instruction will be very verbose. So it’s
the instruction and submit the code files on the experiment better to write a script program to read file name automatically
management website. After the code files are submitted, the and generate the instruction.
website will run the script program to extract key files, call Finally, the MOSS will feed back results in some web pages.
ModelSim to create the project with the code files, and The most important data is similarity percentage and similarity.
simulate the project. At last, the result generated by simulation This article modifies the moss.pl file so that the feedback URL
will be compared to the standard result, as shown in Fig. 2. can be output to the specific file, and then uses the Python
crawler to obtain the desired data from the web page. This
B. Detect with MOSS paper sets two thresholds: similar percentage threshold and
It is necessary to clarify the scope of detection before using similarity threshold. If similarity and similarity percentage in
the MOSS system to detect Verilog code. Students’ plagiarism the result are higher than the thresholds, these two code files
behavior can be divided into two aspects: plagiarizing from will be accused plagiarism and the file names will be output
classmates, including the previous students’ codes; plagiariz- to the specific file. It needs to do a lot of tests before set the
ing from the codes posted on the Internet. The research found thresholds.
that the codes on blogs and forums are copied from a small After detection with the MOSS system, some codes are

753
FriB1.5

Begin

All of code
Input grammar
fi les files of Verilog
Plagiaris
m code
files Generate Generate
Suspected lexical parser grammar parser
plagiarism
code files

Lexical and Verilog code


grammatical analysis files

AST sequence of
Verilog code files
Fig. 3. Code files classified with MOSS.

End
accused plagiarism. Then set lager threshold, more codes will
be suspected plagiarism. If the threshold is set correctly, all of
plagiarism codes will be among these suspicious plagiarized Fig. 4. The flow of generating AST sequence.
ones, as shown in Fig. 3.

C. Verilog code plagiarism detection method based on AST


This method uses the abstract syntax tree as the Verilog code
plagiarism detection model. The main process is divided into
three steps: Generate AST sequences of Verilog codes; Use
Winnowing algorithm and fingerprinting technique to extract
the eigenvalues of AST sequences; Calculate the similarity and
get conclusion.
This paper proposes to use ANTLR as the parser tool. At
first, use ANTLR to generate the lexical parser and grammar Fig. 5. Winnowing feature extraction example.
parser; Then parse Verilog code to generate AST sequence by
using these two parsers. The process is shown in Fig. 4.
The process of analyzing AST sequences with the Winnow- the hash value may be repeated, and the selected feature
ing algorithm and fingerprinting technology is as follows: cannot be selected. To do this, the subscript needs to be
• Preprocess AST sequences. It is mainly for deleting char- recorded. An example is shown in Fig. 5.
acters that are meaningless to the similarity calculation. The similarity between Verilog code AST sequences is cal-
• N-gram string segmentation. This step will split the culated by formula (1), then compared to similarity threshold
AST sequence consecutively, every subsequence’s length and find plagiarism codes. Similarity indicates the degree
is N. Assume that the sequence is ”abcdefg” and N of similarity between the two files, f lag indicates the final
takes a value of 4. After the N-gram segmentation, the feature value set extracted from the Verilog code, and f lagSize
subsequence set abcd, bcde, cdef, defg is obtained. The indicates the size of the feature value set.
choice of the N value is very critical at this time, if the f lag1∩f lag2 f lag1∩f lag2
N value is too large, many information will be missed; if f lag1Size + f lag2Size
Similarity(f lag1, f lag2) =
it is too small, many common elements will be analyzed. 2
(1)
• Hash calculation. Convert this subsequence into fixed-
length hash values which can provide convenience for
D. Experimental results
storage and calculation later.
• Winnowing feature extraction. The basic idea of this Write two unrelated programs with Verilog code, averaging
method is to first set a sliding window of size W. Keep the about 80 lines, named 1.v and 2.v; Generate 2.v-11.v by
smallest hash in each window. If there are two or more modifying 2.v. When modifying, there are five principles:
minimal hash in the window, keep the rightmost one. This Complete copy; Modify comments; Replace names of vari-
ensures that the interval of the preserved AST sequence ables or functions; Adjust the position of code; Plagiarism
elements will not exceed W+K-1. It needs to be noted that based on program semantics.

754
FriB1.5

R EFERENCES
[1] D. Chuda, P. Navrat, B. Kovacova, and P. Humay, “The issue of
plagiarism: A student view,” IEEE Transactions on Education, vol. 55,
no. 1, pp. 22–28, 2012.
[2] F. Rosales, A. Garcia, S. Rodriguez, J. L. Pedraza, R. Mendez, and
M. M. Nieto, “Detection of plagiarism in programming assignments,”
IEEE Transactions on Education, vol. 51, no. 2, pp. 174–183, 2008.
[3] Z. uri and D. Gaevi, “A source code similarity system for plagiarism
detection,” Computer Journal, vol. 56, no. 1, pp. 70–86, 2013.
[4] C. Yang, “Hybrid plagiarism detection method in program code based on
multiple techniques,” Computer Engineering and Applications, 2016.
[5] S. Schleimer, D. S. Wilkerson, and A. Aiken, “Winnowing:local algo-
rithms for document fingerprinting,” in Proc. ACM SIGMOD Conference,
June, 2003, pp. 76–85.
[6] D. Gitchell and N. Tran, “Sim: a utility for detecting similarity in
computer programs,” in The proceedings of the thirtieth SIGCSE technical
symposium on Computer science education, 1999, pp. 266–270.
Fig. 6. Result detected by MOSS. [7] L. Prechelt, “Finding plagiarisms among a set of programs with jplag,”
J Universal Computer Science, vol. 8, no. 11, pp. 1016–1038, 2002.
[8] C. Zhao, H. Yan, and M. Jin, “Approach based on compiling optimization
and disassembling to detect program similarity,” Journal of Beijing
University of Aeronautics and Astronautics, vol. 34, no. 6, pp. 711–715,
Firstly, use the MOSS system to detect plagiarism, some 2008.
results are as Fig. 6. It is a part of the result after filtering. From
this result, it can be seen that there is no excessive similarity
between 1.v and the rest of the files, the same as 11.v too.
And similarity between 2.v-10.v is extremely high. 3.v-10.v
are modified from 2.v refer to the former four principles. 11.v
file is based on semantic plagiarism. It is obvious to find that
MOSS cannot detect plagiarism based on semantics, but has
high accuracy for other plagiarism methods.
Then, continue detect 1.v, 2.v, and 11.v using the AST-based
detection method. The result indicates that 2.v and 11.v have a
high degree of similarity. Compared to using MOSS alone, the
performance of the detection method proposed in this paper
has been significantly improved.

IV. CONCLUSION

This paper proposes a method of Verilog code plagiarism


detection for colleges. After verifying the correctness of the
code, one part of plagiarism code is filtered out by the MOSS
system and then the other is filtered out by the plagiarism
detection method based on AST. However, there are still
something to be improved, mainly in the following aspects:
When generating AST sequences, the grammar of the Verilog
language needs to be considered comprehensively; When
setting similarity threshold, it needs to be tested repeatedly
before obtain a right value; This method uses the MOSS to
detect plagiarism for the first round detection, but it cannot
always be performed normally due to network reasons or
maintenance of the MOSS server, so it’s better to implement
the Winnowing algorithm in stead of using the MOSS system.

ACKNOWLEDGMENT

This research is in part supported by Chinese Ministry of


Education-XILINX Comprehensive Teaching Reform Project,
DIGILENT University Cooperation Program and The Twelfth
Experiment Teaching Reform Program of Tongji University
(No. 0800104217).

755

You might also like