SBN

Using Stats in Splunk Part 2: Seasonality

Seasonality, which states predictable variations in data will occur over specific time periods, is one the most important concepts in statistical analysis of time series data in Splunk. For example, it’s expected that you’d see more data logged during business hours, and less during off-hour times. These variations can throw a wrench into typical anomaly detection techniques–as outlined in part 1–if not taken into account.

This article will offer an explanation of seasonality as well as techniques for taking it into account in your searches; we will also provide you with a practical example of how to account for this type of behavior in your anomaly detection searches.

Real world example

To help explain seasonality, we’ll work through a real world example in detecting unexpected dips in indexed data. Many sources of machine data generate more logs during normal business hours (when they’re being actively used), so this is a situation where taking seasonality into account is appropriate.

Before we start, here is the full example that we’ll break down:

Copy to Clipboard

Once you get the stats generated in the lookup–and have a search that populates it every so often–you can implement the following:

Copy to Clipboard

There are two spans that ensure data is accurate. The first takes into account the fact that data may not come in during certain time periods. The following will fill in data during those spans where no logs are generated.

Copy to Clipboard

This packs data into a specific format, makes it continuous, fills in null values with a value, and then unpacks the data. Note that the xyseries command takes exactly three arguments. If you have more than three you’ll need to do something like the following:

Copy to Clipboard

Using !!!!! is arbitrary; you only need to have a separator string that won’t appear in your data normally.

The second seasonality piece is the following:

Copy to Clipboard

This should be more self explanatory. We get the day of the week and the hour from the timestamp, and we evaluate when it occurs. From this, we can calculate statistics based on which category the event fits into.

Conclusion

With these techniques, you can now incorporate seasonality in your searches. It is a powerful technique which can really help you cut down on the noise in your alerts. Be sure to keep an eye out for Part 3 of this series–I’ll be taking a look at some less commonly-used commands and how they may (or may not) be useful in your investigations.

The post Using Stats in Splunk Part 2: Seasonality appeared first on Hurricane Labs.


*** This is a Security Bloggers Network syndicated blog from Hurricane Labs authored by Tim Strawbridge. Read the original post at: http://feedproxy.google.com/~r/HurricaneLabsEngineeringNotes/~3/ULgzpdDEmZw/