Adaptive Intrusion Detection System: Hybrid K-Means and Random Forest Approach with Concept Drift Detection

Eric Garcıa-Huitzitl, Lázaro Bustio-Martínez, René Cumplido

Abstract


The ever-evolving data landscape presents significant challenges, such as concept drift, where shifts in statistical distributions within data streams pose critical cybersecurity threats. Traditional machine learning, which relies on static models, struggles with concept drift, underscoring the necessity for adaptive approaches specifically designed for streaming data. This paper investigates methodologies aimed at enhancing security in dynamic data environments. A hybrid concept drift detection method that combines error rate analysis with data distribution monitoring is proposed. Additionally, to update the training dataset, the approach employs a combination of sliding window-based data capture and drift analysis, along with K-Means clustering and a Random Forest classifier. This includes the use of two types of sliding windows: fixed and adaptive. Adaptive Random Forest classifier is used to anomaly detection and retraining the model. Experiments were conducted on the NSL-KDD dataset to detectand quantify the severity of concept drift, utilizing techniques such as Principal Component Analysis and Spearman’s Correlation Coefficient. Consequently, the performance of the Intrusion Detection System to adapt to these changes was also evaluated. The proposed adaptive model demonstrates significant enhancements, with Adaptive Random Forest achieving a classification accuracy of 98.66%. Furthermore, precision, detection rate, and F1-score rates of 99.52%, 97.74%, and 99.78%, respectively, are achieved. All this while maintaining a low false alarm rate of 1.14%.

Keywords


Adaptive IDS, concept drift, hybrid approach, clustering, classification

Full Text: PDF