Logs are valuable information describing runtime events. However, because of the sheer amount of logs generated by large online services, it is impossible to get any insights from them in real time using traditional observability tools. This is why at PacketAI, we built a real-time log analysis engine based on text mining and machine learning algorithms.
Since logs are mainly unstructured textual data, the first step towards automated analysis of logs is the “representation phase”, where raw log lines are transformed into structured data ready for downstream tasks such as anomaly detection. In this article, we present our design for Log2template, a universal log parsing engine that is agnostic to log formats. Log2template at its core, is based on word embedding using neural networks.
A log message is mostly an unstructured (sometimes semi-structured) line of text printed by logging statements (e.g., printf()) defined by developers. Large-scale services usually generate millions of logs, which describe a vast range of events that record service runtime information. Therefore, they are crucial for service management tasks, such as resource allocation, scheduling, troubleshooting etc.
In the previous article of this series, we have presented the high level design of our AI-based log (auto)analysis engine. We touched upon the subject of compressing raw logs into log templates (template extraction) for better representation and ease of analysis, then we briefly run through different types of log anomalies.
In the following sections, we will dig a bit deeper into the first step, i.e extracting log templates from raw logs (yellow elements in the diagram above). We will first motivate this step, discuss state of the art tools and techniques, then present “the PacketAI way”.
As mentioned before, logs are super important because they record valuable system runtime information. This information describes the inner workings of the system as opposed to outer working measures described by metrics. Logging however is widely used offline for postmortem analysis and auditing. The main reason for this offline instead of online usage is the unfeasibility to process huge log volumes with existing descriptive tools, such as search engines like Elasticsearch, Datadog, Loki ..
Typically, a log analysis tool provides parsing modules, a search engine and dashboards. This is great for postmortem investigations and offline auditing. Usually these tools employ regex-based parsing to extract key value fields then index them for quick search when needed. A few products go a step further by providing clustering of logs into templates (or patterns), like the paid version of ELK (x-pack modules) and Datadog.
The major limitation of these clustering methods is that they also require regex, first, to extract log formats and extract the log content, and second, to cluster the log content itself. As a consequence, these methods work acceptably fine on standard logs from standard sources like common databases and operating systems, where log formats are limited, open source and well understood. However, they either don’t work at all or perform poorly on custom logs generated by custom applications. We do know this because we have built these methods into our product, and have a long experience maintaining them for our clients. Register here to try it out.
Unless the log parser (we use the term to signify log transformation from unstructured to structured data) is agnostic to log formats, there will be always a need to manually create and maintain log formatting rules (regex), which is tedious and prone to human error. Even worse, there will always be a need to go back and forth with users to determine log formats and update them in case of change.
Our vision was to to build a universal log parser that takes in raw logs of any arbitrary format, and outputs a set of log clusters (templates) based on a defined similarity metric. The system must not need any additional regex rules.
To build such a system we experimented with multiple techniques and finally settled down with a simple, proprietary neural network architecture, that is in turn based on word2vec embedding. The idea here is to map each log line into a vector in a multi-dimensional vector space where similar log lines share the same vector representation.
The input of Log2template is a log message, with or without header. In case the header is kept, we call it a raw log line. Note that removing the header from the input of this step enables better results since the embedding engine would work on clean, human readable log content.
The Log2template system works in two stages:
During this phase, the system is trained on logs collected over a pre-defined period of time. At PacketAI, we have a process that determines this period of time based on log volumes, quality and other business requirements. The workflow is as follow:
In this phase, the trained model is put online for real-time inference. Since the model is designed to work on huge volumes of logs in real-time, significant amount of work has gone into scaling them and making sure the deployment is fast, auto-scalable and fault tolerant. The low-level deployment architecture is beyond the scope of this article though. The workflow of this phase is as follow:
The output of Log2template is then presented to the user on the PacketAI logs UI, you can give it a try in less than 5 min by registering here:
In this article, we presented our design for Log2template, a universal log parsing engine that is agnostic to log formats. Log2template at its core, is based on word embedding using neural networks, and solves multiple log representation problems found in traditional observability tools, including:
In the next part of this series, we will dig deeper into one of the anomaly detection methods we apply on logs, which is based on the Log2template representations.