# Fluentd Performance Tuning

We have deployed the log colleges system in the k8s cluster before, but have poor performance. The maximum resource of Fluentd is 800M memory and 1 CPU, it can only process and push almost 1500 logs per second. We need performance tuning.

# 1、HOW TO TEST

First of all, I’ll describe how to test FLuentd performance.

New a simple Java Boot Project, or change an existing project. And change SpringBootApplication class.

## 1、Change SpringApplication

Change：

To：

Thread.sleep(10); Used to roughly control the number of logs printed per second. 1s = 1000ms, if each loop sleeps 10 ms, then about 100 logs are printed in 1s. If each loop sleeps 1ms, then about 1000 logs are printed in 1s. We can copy more log.info to add the number of logs printed. About 20k number of logs are printed in 1s if follow the config below.

Then build the Java application to Docker image, deployed on k8s platform. Start the application and it will print logs all the time. By looking at the logs in Fluentd, we can get the number of logs processed and push number per second. Then we can know the performance of Fluentd.

## 2、change log_level to debug

Change @log_level to debug in <match **> @id kafka at Fluentd config file. Then we can get the number of pushed logs to kafka, like:

The value of flush_interval in <match **> @id kafka <buffer>module is meaning the frequency of the push.

## 3、Calculate the performance

If we see [kafka] 2000 messages send in Fluentd log, and the value of flush_interval is 10s. That meaning Fluentd processed and pushed 2000 logs in 10s, the performance is 200/s.

# 2、TUNING IDEAS

After seeing the poor performance, first I went to search the peak performance of Fluentd. To determine the poor performance is the Fluentd’s problem or our configuration problem. After checking the official documentation, I could see that the minimum performance should be at 5000 entries/s. This means that there is a problem with our configuration, the next step is to find the slow part and performance tuning.

# 3、PROBLEM POSITIONING

## 1、performance test without plugins

We added a lot of plugins in Fluentd configuration, like detect_exceptions to merge Exceptions, concat to merge multi-lines logs, kubernetes_metadata to add k8s clusters info, record_transformer to add or delete fields. Is there a possibility that their plugins cause poor performance? Or in other words, if only read log & push, without other logic. How much peak performance can Fluentd achieve? The value of flush_interval in Fluentd is 10s.

Modify the config file as below for testing.

View the log file:

Like 4000 entries/s, it’s not very well because the official document says that optimization is required if 5000 entries/s is reached. It means that 5000 entries/s should be easy to reach.

## 2、tail problem positioning

After troubleshooting, the problem was finally found to be in json time parsing.

time_format %Y-%m-%dT%H:%M:%S.%NZ

Removing this line has a very significant performance improvement. After calculation, the speed can reach 21000/s. But we are not sure if we have reached the peak of performance of Fluentd, but we can see a 5x performance improvement.

What this line does is format the field time to %Y-%m-%dT%H:%M:%S.%NZ format. in fact, this field is not used in our usage scenario. Only one place where the time field is relevant is in the Kibana page where the kafka push time is used to calculate the kafka push delay ms. kafka’s push time is [settable](https://docs.fluentd.org/output/kafka#use _event_time), using the Fluentd event time or the current time.

And the Java application’s log is printed by plugins, the beginning of the log is time. So we don’t need to format this field. We can delete it to improve performance.

## 3、de-time format performance test

How much performance can be reached after adding all plugins? If it can reach about 7000/s, then we don’t need to tune anymore.

After testing, find that only about 3000 enteris/s, which means plugins can reduce performance a lot.

The next goal is very clear, troubleshoot which plugin has a greater impact.

# 4、PLUGIN PERFORMANCE TEST

The logic is simple, remove all the plugins and keep only input and output modules, then add our existing plugins one by one to try them out and see which one has a bigger impact on performance.

## 1、only keep detect_exceptions

If only keep detect_exceptions plugin, after testing we can see the performance is about 9000/s. Compared to the peak of 21k/s, the performance is reduced by amount 50%, indicating that this plugin has poor performance.

## 2、only keep concat

If only keep concatplugin, after testing we can see the performance is about 19.6k/s. Compared to the peak of 21k/s, it’s very close, indicating that this plugin has high performance.

If only keep kubernetes_metadataplugin, after testing we can see the performance is about 9k-10k/s. The performance of this plugin is very average.

Because record_transformer is for adding or deleting the field, is based on kubernetes_metadata. So we must test the performance based on kubernetes_metadata.

We can see the performance is about 8k-9k/s, compare to without record_transformer, it’s only 10% reduced, indicating that record_transformer have high performance.

# 5、REMOVE PLUGINS

## 1、remove plugin detect_exceptions

The usage of detect_exceptions is to merge multi-logs to one log can detect such as Java.

The usage of concat is the regular matching of logs, and then merging multiple lines of logs.

The principle of concat is very simple, regular matching the beginning of log, the configuration of plugin is :

/^[\[ ]*\d{1,4}[-\/ ]+\d{1,2}[-\/ ]+\d{1,4}[ T]+\d{1,2}:\d{1,2}:\d{1,2}/

Because the normal log is the beginning of time. If there are multiple lines: one log is beginning of time + multiple logs without beginning of time + one log is beginning of time. Then there should really be only two logs, one logs is the beginning of time + multiple log without the beginning of time are the same log.

eg:

There are really only two logs:

But when you think about it, isn’t Exception the same situation?

So plugin concat has contained the function of detect_exceptions. Just one thing that needs special treatment.

The format of logs in k8s cluster should be matched by regular expressions, otherwise will not be merged multiple logs.

But this is not a big problem, we can come up with a specification that the logs of the project deployed on our k8s platform should be formatted to a certain format.

Then we can remove detect_exceptions plugin to improve performance.

Can we find a way to remove the kubernetes_metadata plugin?

What this plugin does is add some pod-related information to the logs:

But a lot of informationrelated to the k8s cluster is involved and needs to be removed we don’t need them. All we need is a tag to position the pod’s log in Kibana, like pod_id, container_name, namespace_name ,and so on.

And this information is existing the tag field. The tag contains *[podName][nameSpace]*[deploymentName]-[containerId] **,is already enough.

We add tag \${tag_parts[4]} in the configuration file, then we can search logs in Kibana through tag : podName or tag : containerId.

Then we can remove the plugin kubernetes_metadata.

# 6、NEW CONFIG FILE TEST

We have already removed both detect_exceptions and kubernetes_metadata plugins. Each plugin reduced almost 50% performance. And modify other plugins configuration, the newest config file is below:

After testing, the performance is almost 9k+/s, it has exceeded the expected target.

# 7、REVIEW & SUMMARY

After performance tuning, there is 3x performance. Our thinking is very clear, first of all, search the performance of Fluentd to position the problem is that Fluentd’s problem or our config problem. Second of all, we have confirmed it’s our config problem, our goal is to look for the poor performance plugins, and looking forward to removing them or replace to other high-performance plugin. Step by step. the result is exceeded the expected target.

Also, since Fluentd is developed in Ruby, which has GIL(Global interpreter lock), it only allows one thread to execute at a time, so there is bottlenecks in performance. If you have higher performance requirements, you can refer to the official documentation, use the multiple worker mode to further improve performance.