If you’ve used Prometheus in the past for metrics collection/monitoring, then you’re probably familiar with Prometheus recording rules. As the Prometheus documentation says, recording rules allow you to precompute frequently needed or computationally expensive expressions and save their result as a new set of time series. Recording rules are the go-to approach for speeding up the performance of queries that take too long to return, and they are also commonly used to alias complex or repetitive PromQL subexpressions. While recording rules can be quite useful and are almost certainly in use in any production environment, they do have some downsides as well. In this post, we’ll take a look at how recording rules work, and discuss some of the pain points that you are likely to encounter when your workload scales up.
How Recording Rules Work
So, how do recording rules work? Recording rules are defined in YAML files that are referenced by the rule_files option in the Prometheus configuration file. Rules are structured into named groups, with each group having the ability to set the evaluation interval (default is 60s, as specified by the evaluation_interval option in the Prometheus configuration file), and an optional limit on the number of series that can be generated by the rules in the group. As part of the Prometheus server’s regular operation, it evaluates the provided rules on a schedule according to the configured evaluation interval for each group.