MULTI-LAYER MODEL AND TRAINING METHOD FOR MALWARE TRAFFIC DEETECTION BASED ON DECISION TREE ENSEMBLE

В’ячеслав Васильович Москаленко, Микола Олександрович Зарецький, Альона Сергіївна Москаленко, Антон Михайлович Кудрявцев, Віктор Анатолійович Семашко

Abstract


The model and training method of multilayer feature extractor and decision rules for a malware traffic detector is proposed. The feature extractor model is based on a convolutional sparse coding network whose sparse encoder is approximated by a regression random forest model according to the principles of knowledge distillation. In this case, an algorithm of growing sparse coding neural gas has been developed for unsupervised training the features extractor with automatic determination of the required number of features on each layer. As for feature extractor, at the training phase to implement of sparse coding the greedy L1-regularized method of Orthogonal Matching Pursuit was used, and at the knowledge distillation phase, the L1-regularized method at the least angles (Least regression algorithm) was additionally used. Due to the explaining-away effect, the extracted features are uncorrelated and robust to noise and adversarial attacks. The proposed feature extractor is unsupervised trained to separate the explanatory factors and allows to use the unlabeled training data, which are usually quite large, with the maximum efficiency. As a model of the decision rules proposed to use the binary encoder of input observations based on an ensemble of decision trees and information-extreme closed hyper-surfaces (containers) for class separation, that are recovery in radial-basis of Hemming' binary space. The addition of coding trees is based on the boosting principle, and the radius of class containers is optimized by direct search. The information-extreme classifier is characterized by low computational complexity and high generalization capacity for small sets of labeled training data. The verification results of the trained model on open CTU test data sets confirm the suitability of the proposed algorithms for practical application since the accuracy of malware traffic detection is 96.1 %.

Keywords


intrusion detection system; convolutional sparse coding model; growing sparse coding neural gas; decision tree ensemble; regression random forest; information criterion; knowledge distillation; information-extreme machine learning.

References


Skrzewski, M. Flow Based Algorithm for Malware Traffic Detection. Proc. of the 18th Conference Computer Networks (Communications in Computer and Information Science), Ustroń, Poland, 2011, vol. 160, pp. 271-280. DOI: 10.1007/978-3-642-21771-5_29.

Berkay Celik, Z., Walls, R., McDaniel, P. and Swami, A. Malware traffic detection using tamper resistant features. MILCOM 2015 – 2015 IEEE Military Communications Conference, 2015, pp. 330–335. DOI: 10.1109/MILCOM.2015.7357464.

Ferreira, D. C., Vázquez F. I., Vormayr, G. A meta-Analysis approach for feature selection in network traffic research. Proceedings of the Reproducibility Workshop. ACM, 2017, pp. 17-20

Yousefi-Azar, M., Varadharajan, V., Hamey, L., Tupakula, U. Autoencoder-based feature learning for cyber security applications. Proc. of the 2017 International Joint Conference on Neural Networks (IJCNN). Anchorage, Alaska, USA, 2017, pp. 3854-3861.

DOI: 10.1109/IJCNN.2017.7966342.

Wang, W. Zhu, M., Zeng, X., Ye, X., Sheng, Y. Malware traffic classification using convolutional neural network for representation learning. Proc. of the 31st International Conference on Information Networking (ICOIN 2017). Da Nang, Vietnam, 2017, pp. 712-717. DOI: 10.1109/ICOIN.2017.7899588.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A. Going deeper with convolutions. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 1-9. DOI: 10.1109/CVPR.2015.7298594.

Zhao, B., Lu, H., Chen, S., Liu, J., Wu,D. Convolutional neural networks for time series classification. Journal of Systems Engineering and Electronics, 2017, vol. 28, no. 1, pp. 162-169. DOI: 10.21629/JSEE.2017.01.18.

Feng, Q. Chen, C. L. P., Chen, L. Compressed auto-encoder building block for deep learning network. Proc. of the 3rd International Conference on Informative and Cybernetics for Computational Social Systems (ICCSS), Jinzhou, 2016, pp. 131-136. DOI: 10.1109/ICCSS.2016.7586437.

Tang, J., Wang, D., Zhang, Z., He, L., Jing, X., Xu, Y. Weed identification based on K-means feature learning combined with convolutional neural network. Computers and Electronics in Agriculture, 2017, vol. 135, pp. 63-70. DOI: 10.1016/j.compag.2017.01.001.

Labusch, K., Barth, E., Martinetz, T. Sparse coding neural gas: learning of overcomplete data representations. Neurocomputing, 2009, vol. 72, iss. 7-9, pp. 1547-1555. DOI: 10.1016/j.neucom.2008.11.027.

Mrazova, I., Kukacka, M. Image Classification with Growing Neural Networks. International Journal of Computer Theory and Engineering, 2013, vol. 5, no. 3, pp. 422-427. DOI: 10.7763/IJCTE.2013.V5.722.

Palomo, E. J., López-Rubio, E. The Growing Hierarchical Neural Gas Self-Organizing Neural Network. IEEE Transactions on Neural Networks and Learning System, 2017, vol. 28, no. 9, pp. 2000-2009. DOI: 10.1109/TNNLS.2016.2570124.

Li, H.-T., Lin, S.-C., Chen, C.-Y., Chiang, C.-K. Layer-Level Knowledge Distillation for Deep Neural Network Learning. Applied Sciences, 2019, vol. 9, pp. 1966. DOI: 10.3390/app9101966.

Zhou, Y., Zhou, Z., Hooker, G. Approximation Trees: Statistical Stability in Model Distillation. ArXiv, 2018, vol. abs/1808.07573.

Kim, S., Yu, Z., Man Kil, R., Lee, M. Deep learning of support vector machines with class probability output networks. Neural Networks, 2015, vol. 64, pp. 19-28. DOI: 10.1016/j.neunet.2014.09.007.

Moskalenko, V., Moskalenko, A., Korobov, A., Semashko, V. The model and training algorithm of compact drone autonomous visual navigation system. Data, vol. 4, iss. 1, 2019, pp. 1-14. DOI: 10.3390/data4010004.

Gwon, Y., Cha, H., Kung, H.T. Deep Sparse-coded Network (DSN). International Conference on Pattern Recognition (ICPR), 2016, pp. 2610–2615. DOI: 10.1109/ ICPR.2016.7900029.

Moskalenko, V. V., Moskalenko, A. S., Zarecz`ky`j, M. O. Model' i alhorytm navchannya detektora shkidlyvoho trafiku na osnovi modyfikatsiyi zrostayuchoho neyronnoho hazu [Model and training algorithm of malware traffic detector based on modification of growing neural gas]. Radioelektronni i komp'uterni sistemi – Radioelectronic and computer systems, 2018, no. 3(87), pp. 11-19, 2018. DOI: 10.32620/reks.2018.3.02.




DOI: https://doi.org/10.32620/reks.2020.2.08

Refbacks

  • There are currently no refbacks.