arXiv

NLLog: Lightweight, Explainable SOC Anomaly Detection via Log-to-Language Rewriting

June 4, 2026 · Samuel Ndichu, Tao Ban, Seiichi Ozawa, Takeshi Takahashi, Daisuke Inoue · Original Source

Title: NLLog: A Lightweight, Explainable Approach to SOC Anomaly Detection Through Log-to-Language Transformation

Abstract: While system-generated logs are the foundation of security monitoring, their rigid, template-driven structure often impedes both automated processing and human understanding. To address this, we introduce NLLog (Natural-Language Log), a streamlined pipeline that transforms parsed templates into structured WHO-WHAT-SEVERITY sentences through deterministic rewriting. This process is followed by pooling with term-frequency-inverse-document-frequency (TF-IDF) weighting, session classification via tree ensembles, and the back-projection of evidence using TreeSHAP to facilitate analyst review.

Evaluations on the Hadoop Distributed File System (HDFS) and Blue Gene/L (BGL) datasets demonstrate that NLLog outperforms two reproduced baselines that adhere to the same protocol. Furthermore, across the HDFS, BGL, and AIT Alert Data Set, the system maintains low false-positive rates and operates with latency on commodity hardware, making it well-suited for security operations center (SOC) triage. Ablation studies focusing on coverage, sparse versus dense representations, faithfulness, and adversarial scenarios reveal that fallback adequacy varies by corpus. Additionally, an enrollment-time coverage check can identify refinement needs prior to deployment. Ultimately, the combination of an auditable, deterministic rewrite mechanism with lightweight dense encoding offers a quantifiable representation layer for log-based anomaly detection and triage.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC