Finding method in the madness: the challenges of the automatic classification of log messages
by Péter Gyöngyösi (BalaBit IT Security)
Friday, 09.05.2014, Stage 11, 17:00-17:30 Uhr
Track: dc5
System logging is a particularly important function in current computing infrastructures, used in several fields from tracking down errors over alerting on security-related events to providing an audit trail of events for later use. One of the most critical problems is the real-time and postponed processing of the huge amount of data that is generated, which is especially difficult in the case of logs generated as plain unstructured text messages. Although various system logging techniques producing and handling structured data are available to make log processing easier, using them proves to be too complicated in several cases and the most widespread form of logs remains the plain text message.
The first step towards effectively handling this vast amount of data it is to be able to recognize events in the logs and to be able to classify them. In this presentation we will give an overview of the various approaches to this. We will cover solutions based on manually maintained patterns, heuristics of various complexity and advanced algorithms based on statistical clustering algorithms. We will discuss the algorithmic background of these approaches, analyze their effectiveness and the possible caveats and name a few open source and commercial solutions available on the market that follow the given approach.
We will share real-life experience about these solutions and describe use cases at organizations we've seen. We will discuss the advantages and drawbacks of them and cover topics like the required initial setup and integration time, maintenance needs and false alert ratio. We will investigate performance and scalability issues. We will analyze which solutions can be used real-time and which can only be used in post-hoc situations.
The presentation will give you a good overview of the various solutions available to create structured information from unstructured log-related data and help you make a better decision by learning about the possibilities and the experiences of others walking in your shoes.
About the author Péter Gyöngyösi:
Peter Gyongyosi is an engineer by trade who first started focusing on IT security when he specialized in this topic at the university and wrote his thesis on analyzing system logs. He has a track record as a developer and software architect and he is currently the product manager of the log management product line at the company behind syslog-ng, BalaBit IT Security. He spends a large portion of his time thinking about the future of logs but still sends patches to syslog-ng occasionally.