The three payload features in thisexample were converted to bigram features and Trigram features as in Table 1and table 2 respectively.
The table’s header represents the standard feature vector for all the payloadfeatures in the taken example. The payload features appear in the table as perthe order of their presentation to the feature extraction algorithm, that is thefirst row corresponds to the first payload feature discovered during dictionarygeneration and second row corresponds to the second payload feature discoveredduring dictionary generation so on. After all this preprocessing, inorder to prepare the resulting data set for feature selection, a pre-rankingstep has been conducted for the features using a quick filter selectionalgorithm that is the well known Correlation-based Feature Selection (CFS). Thereason for this ranking is to make sure that the reduced features are stillinformative and thus far, still contained distracting features as well. Hence avery fast feature selection method is used to pre-rank the features, which iscalled Correlation-based Feature Selection (CFS). By using this method, it iseasy to rank the features in order to perform controlled experiments bymanipulating relevant and irrelevant features. Because the current number of features is veryhuge and the feature selection is really a time consuming activity, hence a smallsubset of 250 features from the original features are taken for theexperimental purpose.
From the obtained ranked features list from the CFSmethod, a smaller subset of this ranked list is extracted in such a way that itconsists of one portion of the top ranked features and nine portions of lowestranked features. The specified portion size in thiswork is taken as 25 features; therefore the subset size is 250 features intotal. This way of selection is intentional to test the efficacy of theproposed system against 90% of the total features are bad features compared tothe good features. The small subset of features is taken as sample from thehuge set of total feature to save time and computational effort. Those resulting 250 features areused to generate data sets of size 20, 40, 100, and 400 examples respectively. Inorder to simulate “zero-day” attacks, the data sets have been chosen to besmall in terms of number of examples. We generated those data sets as balanceddata sets (i.
e. we made them of equal number of normal and attack examples). Differentnumbers of examples were used to monitor the behavior of feature selection witheach size of data set.
To observe the effect of includingthe payload features to improve the detection accuracy, a very importantexperiment has been conducted. The well known machine learning algorithm, theSVM’s classification was used to find out the accuracy and F-measure on the ISCX 2012 data sets in two cases, first without (BigramFeatures / Trigram features) the payload features and second with (Bigram / Trigram features)payload features. The performance metrics have been measured before and afterconverting the payload features to bigram and trigram features and applyingfeature selection. The main objective of thisexperiment is to show that these payload features include important and useful informationin improving the detection accuracy.
Because of the inability of handling thelong payload futures, many of the previous researchers excluded payloadfeatures from the original set of features before looking for intrusions. Thefeature selection method was applied on the four generated data sets of ISCX2012 and presented the maximum obtained accuracy and F-measure as shown in Table 3and Table 4 respectively.