Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Enhancing cybersecurity log analysis through Retrieval-Augmented Generation

Briliyant, Obrina, Javed, Amir ORCID: https://orcid.org/0000-0001-9761-0945 and Cherdantseva, Yulia ORCID: https://orcid.org/0000-0002-3527-1121 2025. Enhancing cybersecurity log analysis through Retrieval-Augmented Generation. Presented at: 3rd International Conference on Foundation and Large Language Model (FLLM), Vienna, Austria, 25-28 November 2025. Proceeding of the 3rd International Conference on Foundation and Large Language Model (FLLM) 25-28 November 2025, Vienna, Austria.
Item availability restricted.

[thumbnail of accepted not yet published] PDF (accepted not yet published) - Accepted Post-Print Version
Restricted to Repository staff only

Download (545kB)
[thumbnail of Provisional File This article is currently in press.pdf] PDF - Accepted Post-Print Version
Download (17kB)

Abstract

The exponential growth of cyber threats and the corresponding volume of security log data have created unprecedented challenges for security analysts. Traditional log analysis approaches struggle with the scale, complexity, and domain expertise requirements necessary for effective vulnerability detection and incident response. This study addresses these challenges by implementing and evaluating Retrieval-Augmented Generation (RAG) architectures specifically optimized for cybersecurity log analysis. We conducted a comprehensive comparative analysis of three distinct retrieval techniques: base vector similarity search, parent document retrieval, and ensemble retrieval. Our experimental framework utilized Apache server logs and Healthapp logs containing security events, processed through different embedding and chunking strategies. The evaluation employed the RAG Assessment Score (RAGAS) framework to assess precision across multiple local large language models (LLMs). Our methodology revealed critical insights into the selection of local LLM for cybersecurity logs analysis and the performance of three retrieval techniques. The results demonstrate that base vector similarity retrieval achieved optimal overall performance with a score of 0.7482, significantly outperforming parent document retrieval (0.6753) and ensemble techniques (0.6965). Comparative analysis with PDF-based RAG systems revealed that cybersecurity-specialized implementations provide measurable advantages in faithfulness (5.3% improvement) while maintaining competitive performance across other metrics. These findings provide actionable insights for organizations seeking to implement localized AI-augmented cybersecurity log analysis systems in production environments.

Item Type: Conference or Workshop Item (Paper)
Status: Unpublished
Schools: Schools > Computer Science & Informatics
Date of First Compliant Deposit: 8 October 2025
Date of Acceptance: 15 September 2025
Last Modified: 08 Oct 2025 13:45
URI: https://orca.cardiff.ac.uk/id/eprint/181392

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics