This project must build as a separate software or plugin for any existing IDEs. This project should be able to detect the source code vulnerabilities at the coding level. Project should be developed using machine learning techniques.
I have a proposed methodology in the attached files, or feel free to come up with better solutions we will discuss and conclude the methodology.
Project has two main phases. 1. Vulnerable code segment detection and 2. Correct the detected vulnerable code segments / suggest the corrected code.
1. Vulnerable code segment detection
These vulnerable code segments should be related to the information security. For example, code segments which helps to make Cross site scripting attacks, Buffer overflows, SQL Injection etc. I have found an existing data set from this project GitHub - SySeVR/SySeVR. Better to use this or feel free to come up with better data set if you have.
This detection should be done syntactically and semantically. Then only we can come up with good results. So, once the vectors are created, we have to use machine learning models such as Bert, RNN models(or any ML model you prefer can be discussed and conclude). Hope you can get an idea via the above attached link (SySeVR).
2. Correct/ Suggest the corrected code.
Once the vulnerable code is notified using error flag or something, developer will go and see what that notification is. Then we should suggest them the corrected code and once they selected the our suggestions the program code should be updated with the corrected/ suggested code segments.