Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Network packet security analysis chatbot using SQL-RAG approach and cybersecurity-tuned LLM

Briliyant, Obrina, Javed, Amir ORCID: https://orcid.org/0000-0001-9761-0945 and Cherdantseva, Yulia ORCID: https://orcid.org/0000-0002-3527-1121 2025. Network packet security analysis chatbot using SQL-RAG approach and cybersecurity-tuned LLM. Presented at: 6th International Conference on Electrical, Communication and Computer Engineering (ICECCE), Istanbul, Turkey, 27-28 August 2025. Proceeding of the 6th International Conference on Electrical, Communication and Computer Engineering (ICECCE). ieeexplore,
Item availability restricted.

[thumbnail of accepted and presented but not yet published] PDF (accepted and presented but not yet published) - Accepted Post-Print Version
Restricted to Repository staff only

Download (492kB)
[thumbnail of Provisional File This article is currently in press.pdf] PDF - Accepted Post-Print Version
Download (17kB)

Abstract

Cybersecurity analysis faces significant challenges in examining massive volumes of network data generated by increasingly connected systems. Data science and machine learning models help to detect anomalies and classify attacks, but for indepth analysis needed in a security audit or forensic analysis, most still rely on manual search. This create an analytical bottleneck that could bring substantial risks for organizations. Delayed security analyses can result in failed compliance audits, leading to financial penalties and loss of business opportunities. This study presents a novel framework for automating network traffic analysis using a large language model (LLM), structured data query (SQL) and retrieval-augmented generation (RAG) to translate human queries into actionable insights, enabling faster in-depth analysis on large network data, even with less expertise. Evaluation of framework performance conducted using RAGAS (RAG Assessment Suite) demonstrates significant performance differences between specialized and general-purpose LLMs. The Lily-Cybersecurity-7B model demonstrated strong context precision (0.7667) due to its cybersecurity-focused training, effectively identifying relevant security data. However, it struggled with faithfulness (0.0528) and answer correctness (0.2300), primarily due to difficulties translating natural language queries into accurate SQL commands. In contrast, the Mistral-7B model, despite lacking cybersecurity-specific training, achieved superior answer correctness (0.5336) and faithfulness (0.1389), demonstrating better general-domain capabilities. These findings highlight the synergy between SQL generation and AI-driven retrieval for automated network analysis.

Item Type: Conference or Workshop Item (Paper)
Status: Unpublished
Schools: Schools > Computer Science & Informatics
Publisher: ieeexplore
Related URLs:
Date of First Compliant Deposit: 29 September 2025
Last Modified: 08 Oct 2025 14:00
URI: https://orca.cardiff.ac.uk/id/eprint/181390

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics