Crowd counting in intelligent video surveillance systems

Ruslan Dobryshev, Sergiy Purish, Mykhaylo Lobachev, Mykola Hodovychenko

Abstract


This study focuses on enhancing the accuracy and robustness of crowd counting in intelligent video surveillance systems by incorporating perspective-awareness into deep learning models. Traditional convolutional neural networks often struggle with scale variations caused by perspective distortions and object occlusions in dense crowd scenes. The goal of this study is to develop a method that leverages geometric depth information to improve the spatial consistency of density estimations and provide more reliable predictions across varying scene configurations, including highly congested or irregular environments. The tasks to be accomplished include designing a depth inclusion module, integrating it into an encoder-decoder architecture, generating depth maps from monocular RGB images, and recalibrating feature representations using attention-weighted scale-aware mechanisms. The methods used involve the extraction of depth features from images via a pre-trained depth estimation model, followed by spatial attention-based recalibration of feature maps to highlight foreground objects and suppress irrelevant background signals. A fully differentiable pipeline is implemented to ensure seamless integration into standard CNN frameworks. The network training procedure also incorporates Euclidean loss functions on pixel-level density maps to optimize scale-sensitive prediction. The proposed method is evaluated on benchmark datasets including ShanghaiTech-B, UCF_CC_50, and Mall, where it consistently outperforms state-of-the-art models in terms of MAE and MSE. The experimental results confirm that the explicit incorporation of depth-aware representations significantly enhances counting performance, especially in scenarios with severe perspective-induced scale disparities. The integration of geometric priors into data-driven models offers a promising direction for real-time surveillance and large-scale crowd monitoring applications, providing not only quantitative improvements but also greater spatial fidelity in density map generation and better adaptability to complex visual conditions.

Keywords


crowd counting; video surveillance; deep learning; convolutional neural networks; depth estimation; density map; perspective distortion; attention mechanism

Full Text:

PDF

References


Deng, L., Zhou, Q., Wang, S., Górriz, J.M. & Zhang, Y. Deep learning in crowd counting: A survey. CAAI Transactions on Intelligence Technology, 2024, vol. 9, no. 5, pp.1043–1077. DOI: 10.1049/cit2.12241.

Patwal, A., Diwakar, M., Tripathi, V. & Singh, P. Crowd counting analysis using deep learning: A critical review. Procedia Computer Science, 2023, vol. 218, pp. 2448–2458. DOI: 10.1016/j.procs.2023.01.220.

Alhawsawi, A.N., Khan, S.D. & Ur Rehman, F. Crowd counting in diverse environments using a deep routing mechanism informed by crowd density levels. Information, 2024, vol. 15, no. 5, article no. 275. DOI: 10.3390/info15050275.

Khan, M.A., Menouar, H., Hamila, R. & Abu-Dayya, A. Crowd counting at the edge using weighted knowledge distillation. Scientific Reports, 2025, vol. 15, article no. 11932. DOI: 10.1038/s41598-025-90750-5.

Mansouri, W., Alohali, M.A., Alqahtani, H., Alruwais, N., Alshammeri, M. & Mahmud, A. Deep CNN-based enhanced crowd density monitoring for intelligent urban planning on smart cities. Scientific Reports, 2025, vol. 15, article no. 5759. DOI: 10.1038/s41598-025-90430-4.

Zeng, X., Wang, H., Guo, Q. & Wu, Y. Correlation-attention guided regression network for efficient crowd counting. Journal of Visual Communication and Image Representation, 2024, vol. 99, article no. 104078. DOI: 10.1016/j.jvcir.2024.104078.

Cai, Y. & Zhang, D. A weakly supervised crowd counting method via combining CNN and Transformer. Electronics, 2024, vol. 13, no. 24, article no. 5053. DOI: 10.3390/electronics13245053.

Lien, C.-C. & Wu, P.-C. A crowded object counting system with self-attention mechanism. Sensors, 2024, vol. 24, no. 20, article no. 6612. DOI: 10.3390/s24206612.

Alhawsawi, A.N., Khan, S.D. & Rehman, F.U. Enhanced YOLOv8-based model with context enrichment module for crowd counting in complex drone imagery. Remote Sensing, 2024, vol. 16, no. 22, article no. 4175. DOI: 10.3390/rs16224175.

Yaseen, M. What is YOLOv8: An in-depth exploration of the internal features of the next-generation object detector. arXiv, 2024, Aug. Available at: https://doi.org/10.48550/arXiv.2408.15857 (Accessed: 1 March 2025).

Zhao, Z., Ma, P., Jia, M., Wang, X. & Hei, X. A dilated CNN for cross-layers of contextual information in congested crowd counting. Sensors, 2024, vol. 24, no. 6, article no. 1816. DOI: 10.3390/s24061816.

Tomar, A., Nijhawan, R. & Koundal, D. EDCCN: A benchmark encoder–decoder framework for accurate crowd counting. Neurocomputing, 2025, vol. 640, article no. 130304. DOI: 10.1016/j.neucom.2025.130304.

Li, Y.-C., Jia, R.-S., Hu, Y.-X. & Sun, H.-M. A weakly-supervised crowd density estimation method based on two-stage linear feature calibration. IEEE/CAA Journal of Automatica Sinica, 2024, vol. 11, no. 4, pp. 965–981. DOI: 10.1109/JAS.2023.123960.

Zhou, J., Zhang, J. & Gui, Y. Crowd counting in domain generalization based on multi-scale attention and hierarchy level enhancement. Scientific Reports, 2025, vol. 15, article no. 155. DOI: 10.1038/s41598-024-83725-5.

Cao, R., Yu, J., Liu, Z. & Liang, Q. Towards real-world monitoring: An improved point prediction method for crowd counting based on contrastive learning. PLoS ONE, 2025, vol. 20, no. 7, article no. e0327397. DOI: 10.1371/journal.pone.0327397.

Xu, M., Ge, Z., Jiang, X., Cui, G., Lv, P., Zhou, B. & Xu, C. Depth information guided crowd counting for complex crowd scenes. arXiv, 2018, Mar. Available at: https://arxiv.org/abs/1803.02256 (Accessed: 1 March 2025).

Willmott, C.J. & Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 2005, vol. 30, no. 1, pp. 79–82. DOI: 10.3354/cr030079.

Sindagi, V.A. & Patel, V.M. Generating high-quality crowd density maps using contextual pyramid CNNs. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, IEEE, 2017, pp.1861–1870. DOI: 10.1109/ICCV.2017.206.

Gouiaa, R., Akhloufi, M.A. & Shahbazi, M. Advances in convolution neural networks based crowd counting and density estimation. Big Data and Cognitive Computing, 2021, vol. 5, no. 4, article no. 50. DOI: 10.3390/bdcc5040050.

Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J. & Yang, X. Crowd counting via adversarial cross-scale consistency pursuit. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. IEEE, pp. 5245–5254. DOI: 10.1109/CVPR.2018.00550.

Liu, F., Shen, C., Lin, G. & Reid, I. Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, vol. 38, no. 10, pp. 2024–2039. DOI: 10.1109/TPAMI.2015.2505283.

Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015. pp. 1–14. DOI: 10.48550/arXiv.1409.1556.

Li, Y., Zhang, X. & Chen, D. CSR-Net: Dilated convolutional neural networks for understanding the highly congested scenes. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. IEEE, pp. 1091–1100. DOI: 10.1109/CVPR.2018.00120.

Chen, L., Gao, X., Chao, F., Chang, X., Lin, C.M., Gao, X., Lin, S., Zhang, H. & Lin, J. The effectiveness of a simplified model structure for crowd counting. arXiv, 2024. Available at: https://arxiv.org/abs/2404.07847. (Accessed: 1 March 2025).

Zhang, Y., Zhou, D., Chen, S., Gao, S. & Ma, Y. Single‑image crowd counting via multi-column convolutional neural network. In: 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. IEEE, pp. 589–597. DOI: 10.1109/CVPR.2016.70.

Liu, J., Gao, C., Meng, D. & Hauptmann, A.G. DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5197–5206. DOI: 10.1109/CVPR.2018.00545.

Zhang, C., Li, H., Wang, X. & Yang, X. Cross-Scene Crowd Counting via Deep Convolutional Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 07–12 June 2015, IEEE, pp. 833–841. DOI: 10.1109/CVPR.2015.7298684.

Sam, D.B., Sajjan, N.N. & Babu, R.V. Divide and Grow: Capturing Huge Diversity in Crowd Images with Incrementally Growing CNN. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018, IEEE, pp. 3618–3626. DOI: 10.1109/CVPR.2018.00381.

Idrees, H., Saleemi, I., Seibert, C. & Shah, M. Multi-source multi-scale counting in extremely dense crowd images. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013. IEEE, pp. 2547–2554. DOI: 10.1109/CVPR.2013.329.

Loy, C.C., Gong, S. & Xiang, T. Mall dataset: A sparse indoor crowd counting and profiling dataset collected from webcam images. Available at: https://personal.ie.cuhk.edu.hk/~ccloy/downloads_mall_dataset.html (accessed: 1 March 2025).




DOI: https://doi.org/10.32620/reks.2025.2.08

Refbacks

  • There are currently no refbacks.