Briliyant, Obrina, Javed, Amir ![]() ![]() Item availability restricted. |
![]() |
PDF (accepted not yet published)
- Accepted Post-Print Version
Restricted to Repository staff only Download (545kB) |
![]() |
PDF
- Accepted Post-Print Version
Download (17kB) |
Abstract
The exponential growth of cyber threats and the corresponding volume of security log data have created unprecedented challenges for security analysts. Traditional log analysis approaches struggle with the scale, complexity, and domain expertise requirements necessary for effective vulnerability detection and incident response. This study addresses these challenges by implementing and evaluating Retrieval-Augmented Generation (RAG) architectures specifically optimized for cybersecurity log analysis. We conducted a comprehensive comparative analysis of three distinct retrieval techniques: base vector similarity search, parent document retrieval, and ensemble retrieval. Our experimental framework utilized Apache server logs and Healthapp logs containing security events, processed through different embedding and chunking strategies. The evaluation employed the RAG Assessment Score (RAGAS) framework to assess precision across multiple local large language models (LLMs). Our methodology revealed critical insights into the selection of local LLM for cybersecurity logs analysis and the performance of three retrieval techniques. The results demonstrate that base vector similarity retrieval achieved optimal overall performance with a score of 0.7482, significantly outperforming parent document retrieval (0.6753) and ensemble techniques (0.6965). Comparative analysis with PDF-based RAG systems revealed that cybersecurity-specialized implementations provide measurable advantages in faithfulness (5.3% improvement) while maintaining competitive performance across other metrics. These findings provide actionable insights for organizations seeking to implement localized AI-augmented cybersecurity log analysis systems in production environments.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Status: | Unpublished |
Schools: | Schools > Computer Science & Informatics |
Date of First Compliant Deposit: | 8 October 2025 |
Date of Acceptance: | 15 September 2025 |
Last Modified: | 08 Oct 2025 13:45 |
URI: | https://orca.cardiff.ac.uk/id/eprint/181392 |
Actions (repository staff only)
![]() |
Edit Item |