W3tTr3y's blog

A personal technology focused blog

Splunk Indexer Affinity

How much data does each forwarder send to each indexer (i.e. How do we measure indexer affinity)?

Based on our conversations with people in Splunk’s support group, the best way to measure indexer affinity — that is if the forwarders are doing a good job randomly selecting an indexer or if they are “getting stuck” on one or only selecting from a subset — is to look at the volume of data each forwarder sends to each indexer. Over a long period of time — where long is defined by how even the data generation is, how many indexers you have, and how frequently the data is being generated — the distribution of data over the indexers should be fairly even.

Update: Please see my indexer selection not so random post about how forwarders initial indexer selection is NOT random and Indexer Affinity: SPL-69922

Why do we care about indexer affinity?

Obviously the first question when discussing how to measure it, is why is this important?

  1. It affects capacity planning
  2. It affects data retention
  3. It affects how quickly searches return

Capacity Planning

The easiest reason to explain is capacity planning. Indexers are where splunk stores the data sent to it so if you have 20 indexers each with 100GB for Splunk data, then you assume you can store approximately 2,000 GB of data. If the forwarders are sending the data unevenly, then one indexer will run out of room sooner causing issues.

Data Retention

Splunk stores data in what they call bucket’s; a bucket can only be discarded when its most recent event exceeded the retention period. If you have uneven data distribution, then the indexer receiving more events will roll buckets more often, causing it to have a smaller time span and thus its buckets will expire sooner. The indexer that receives less data will have longer time spans and thus it will take longer for the most recent event to pass the retention period. In our case, we have an extremely diverse environment, there are some time issues, etc so we know we will have some un-eveness in our time spans, but we want to minimize it.

Search Speed

Splunk uses map-reduce; basically when a search is ran, the search head parses the search and creates a job which gets distributed to each indexer. Each indexer conducts a search on their data and then sends the results to the search head which compiles the results and performs post processing. If one source type is only generated by a few hosts and the forwarders on those hosts only send to a few indexers, then when the job goes out to the indexers, only a few of them can search the data and they have more data to search. More succinctly, Splunk scales horizontally, bad indexer affinity means your horizontal scale is smaller.

Query: How much data does each forwarder sent to each indexer?

index=_internal source=*metrics.log group=tcpin_connections | 
rename host as indexer hostname as fwder |  
stats sum(kb) as total by indexer fwder |
sort -total
  1. Search Splunk’s metrics log and filter to only incoming connections.
  2. Rename host to indexer and hostname to fwder just to make the rest of the search easier to read
  3. Use the stats command to sum up the amount of transmitted data per indexer and forwarder
  4. Sort it by total in descending order

If you look at the results it should look something like:

indexer fwder total
index1 fwder1 1000
index2 fwder2 997
index2 fwder3 998
index1 fwder4 967
index2 fwder5 109
index1 fwder4 13
index1 fwder2 9
index1 fwder3 6
index2 fwder1 4
index1 fwder5 3