Making Sense Of Failure Logs
Pre-print (PDF) at arXiv: [1]
The full title of this paper is Making Sense of Failure Logs in an Industrial DevOps Environment.
It is written by Muhammad Abbas, Ali Hamayouni, Mahshid Helali Moghadam, Mehrdad Saadatmand, and Per Erik Strandberg, at RISE Research Institutes of Sweden, Mälardalen University and/or Westermo Network Technologies AB. Parts of it were based on the master's thesis by the second author: NLP-based Failure log Clustering to Enable Batch Log Processing in Industrial DevOps Setting. [2]
It has been accepted to 20th International Conference on Information Technology: New Generations (ITNG 2023), [3], once published, a link to the final version will be made available.
Tentative Abstract
Processing and reviewing nightly test execution failure logs for large industrial systems is a tedious activity. Furthermore, multiple failures might share one root/common cause during test execution sessions, and the review might therefore require redundant efforts. This paper presents the LogGrouper approach for automated grouping of failure logs to aid root/common cause analysis and for enabling the processing of each log group as a batch. LogGrouper uses state-of-art natural language processing and clustering approaches to achieve meaningful log grouping. The approach is evaluated in an industrial setting in both a qualitative and quantitative manner. Results show that LogGrouper produces good quality groupings in terms of our two evaluation metrics (Silhouette Coefficient and Calinski-Harabasz Index) for clustering quality. The qualitative evaluation shows that experts perceive the groups as useful, and the groups are seen as an initial pointer for root cause analysis and failure assignment.
KEYWORDS: failure clustering, nightly testing, failure embedding, root cause analysis, DevOps, test logs, log analysis, software testing, word cloud, natural language processing
This page belongs in Kategori Publikationer