How much data does each forwarder send to each indexer (i.e. How do we measure indexer affinity)?
Based on our conversations with people in Splunk’s support group, the best way to measure indexer affinity — that is if the forwarders are doing a good job randomly selecting an indexer or if they are “getting stuck” on one or only selecting from a subset — is to look at the volume of data each forwarder sends to each indexer. Over a long period of time — where long is defined by how even the data generation is, how many indexers you have, and how frequently the data is being generated — the distribution of data over the indexers should be fairly even.
Why do we care about indexer affinity?
Obviously the first question when discussing how to measure it, is why is this important?
- It affects capacity planning
- It affects data retention
- It affects how quickly searches return
The easiest reason to explain is capacity planning. Indexers are where splunk stores the data sent to it so if you have 20 indexers each with 100GB for Splunk data, then you assume you can store approximately 2,000 GB of data. If the forwarders are sending the data unevenly, then one indexer will run out of room sooner causing issues.
Splunk stores data in what they call bucket’s; a bucket can only be discarded when its most recent event exceeded the retention period. If you have uneven data distribution, then the indexer receiving more events will roll buckets more often, causing it to have a smaller time span and thus its buckets will expire sooner. The indexer that receives less data will have longer time spans and thus it will take longer for the most recent event to pass the retention period. In our case, we have an extremely diverse environment, there are some time issues, etc so we know we will have some un-eveness in our time spans, but we want to minimize it.
Splunk uses map-reduce; basically when a search is ran, the search head parses the search and creates a job which gets distributed to each indexer. Each indexer conducts a search on their data and then sends the results to the search head which compiles the results and performs post processing. If one source type is only generated by a few hosts and the forwarders on those hosts only send to a few indexers, then when the job goes out to the indexers, only a few of them can search the data and they have more data to search. More succinctly, Splunk scales horizontally, bad indexer affinity means your horizontal scale is smaller.
Query: How much data does each forwarder sent to each indexer?
1 2 3 4
- Search Splunk’s metrics log and filter to only incoming connections.
- Rename host to indexer and hostname to fwder just to make the rest of the search easier to read
- Use the stats command to sum up the amount of transmitted data per indexer and forwarder
- Sort it by total in descending order
If you look at the results it should look something like: