Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Enhancing cybersecurity log analysis through Retrieval-Augmented Generation

Briliyant, Obrina ORCID: https://orcid.org/0000-0002-1054-8112, Javed, Amir ORCID: https://orcid.org/0000-0001-9761-0945 and Cherdantseva, Yulia ORCID: https://orcid.org/0000-0002-3527-1121 2026. Enhancing cybersecurity log analysis through Retrieval-Augmented Generation. Presented at: 3rd International Conference on Foundation and Large Language Model (FLLM), Vienna, Austria, 25-28 November 2025. Proceedings of the 3rd International Conference on Foundation and Large Language Model (FLLM) 25-28 November 2025, Vienna, Austria. IEEE, pp. 990-995. 10.1109/FLLM67465.2025.11390888

[thumbnail of FLLM25_obrina_SLAB_RAGAS.pdf]
Preview
PDF - Accepted Post-Print Version
Download (545kB) | Preview

Abstract

The exponential growth of cyber threats and the corresponding volume of security log data have created unprecedented challenges for security analysts. Traditional log analysis approaches struggle with the scale, complexity, and domain expertise requirements necessary for effective vulnerability detection and incident response. This study addresses these challenges by implementing and evaluating Retrieval-Augmented Generation (RAG) architectures specifically optimized for cybersecurity log analysis. We conducted a comprehensive comparative analysis of three distinct retrieval techniques: base vector similarity search, parent document retrieval, and ensemble retrieval. Our experimental framework utilized Apache server logs and Healthapp logs containing security events, processed through different embedding and chunking strategies. The evaluation employed the RAG Assessment Score (RAGAS) framework to assess precision across multiple local large language models (LLMs). Our methodology revealed critical insights into the selection of local LLM for cybersecurity logs analysis and the performance of three retrieval techniques. The results demonstrate that base vector similarity retrieval achieved optimal overall performance with a score of 0.7482, significantly outperforming parent document retrieval (0.6753) and ensemble techniques (0.6965). Comparative analysis with PDF-based RAG systems revealed that cybersecurity-specialized implementations provide measurable advantages in faithfulness (5.3% improvement) while maintaining competitive performance across other metrics. These findings provide actionable insights for organizations seeking to implement localized AI-augmented cybersecurity log analysis systems in production environments.

Item Type: Conference or Workshop Item - published (Paper)
Date Type: Published Online
Status: Published
Schools: Schools > Computer Science & Informatics
Publisher: IEEE
ISBN: 9798331594107
Date of First Compliant Deposit: 8 October 2025
Date of Acceptance: 15 September 2025
Last Modified: 02 Mar 2026 14:30
URI: https://orca.cardiff.ac.uk/id/eprint/181392

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics