I will provide JSON files with the extracted features (I will extract these from the pcap files). The features will be from malware and non-malware traffic, so the idea is to detect and classify the malware and non-malware samples. The feature selection should be done two ways: Manual and automated and we should be able to clearly identify the features that are being used and which features work well, and which features do not work well. As there are multiple datasets, we should be able to identify the features that work well across all the datasets. There are different versions of the malware, so we need to clearly identify which features work well across all these malware versions. The manual feature selection should allow me to select which features I want to use for the classification., The automated feature selection can be done using algorithms such as CFS, ANOVA and another one that we can agree on. We must be able to feed into the model a brand-new dataset for testing.
Test and train 5 ML algorithms (supervised and one unsupervised) using classification and multiclass classification algorithms. The ML should be able to classify Botnets (malware) and normal traffic across various datasets and across different variants of the Bot (Malware). Measure the accuracy and so on using statistical calculation (Tables and graphical). The data will need to be analysed and appropriate graphs produced to help visualise the data. We can agree what statistical data to include to measure accuracy.