W3tTr3y's blog

A personal technology focused blog

Splunk 6: Displaying Data in Dashboard Without Granting Access to Underlying Data

In my previous post Saved Search Permission Model Changes in Splunk 6 I discussed a change in Splunk’s permission model for Splunk 6. While there are security considerations, if you have upgraded you have already assumed that risk, so you might as well utilize the benefits.

Overview

One example is our networking team has some use cases where an end-user calls the help desk having a networking issue; they networking team would like the help desk to be able to search the logs and if there is an obvious issue with a known correction, then the help desk could go ahead and resolve the issue for the user. Luckily, we also have some fairly advanced users in that group which should help assure this goes smoothly.

Howto Leverage

First, have the networking team identify the saved searches they want to share and work with them to adjust the permissions so that only their team and the help desk can see those queries — we have about 50 different internal groups utilizing Splunk and we don’t want all users to have access to these saved searches.

Second, while we could then have the help desk either directly invoke these queries with |savedsearch or create the dashboards so they are structure how they want, our help desk has limited Splunk experience while our networking team has really taken the plunge, so we’ll create a new app where the help desk has read access and the networking team has both read and write access.

Third, we’ll have the networking team develop the dashboards they feel would be helpful ensuring they utilize saved searches and then communicate those dashboards to the help desk. We give all of our users a writable app so they have an area that’s theirs, so some of this will be moving dashboards over; we can easily cheat and to a cp on the command line to speed up the copying process.

Then, through this new Saved Search Permission Model, when the help desk views the dashboard, they will be able to see the data.

Alternatives

The primary impact of the above scenerio could also be solved by partition the data at a more granular level and granting the help desk access to just what they need if the dashboard is showing raw data or fields from raw data.

If the data is summarizing things, summary indexing could also be used.

Caution

While I highlighted this concern in my previous post, it bears repeating: now that a saved search could be granting new access, I have concerns that an SPL Injection class of vulnerability exists where filtering forms fields are exploited to gain unintended access. This would be in addition to the configuration errors where saved searches are now granting unintentional access to data.

Closing Remarks

Since Splunk 6 has expanded the capabilities of simple xml, its fairly easy to make dashboards with time pickers, filtering forms, and even that execute one search and then post process it for different perspectives.

In response to Splunk’s confirmation of this as an intended feature, we replied with an inquiry into other situations outside of dashboards where they think this could be useful, but for us this new dashboard capability is an amazing and unexpected step forward.

Saved Search Permission Model Changes in Splunk 6

Overview of Expected Splunk Permission Behavior

Typically you create user in Splunk and then assign roles to that user. These roles include capabilities (the ability to do real-time searches or to schedule searches) and a list of data the user can access.

Unlike most systems, Splunk doesn’t have the ability to edit data (at lest not through the UI) and typically no user – not even an administrator – has delete access; instead data rolls off based on the configured retention time. Thus, giving access to the data typically means the user can search, read, and perform analytics on that data.

Pre-Splunk 6 Saved Search Permission Examples

When creating saved searches, you are really just saving the text that gets ran; the user can do post processing for different visualizations and analysis but the users permission were applied. Thus they couldn’t gain access to any data they couldn’t have just written an ad-hoc query to see. While there are permissions on the saved searches, they controlled who could utilize the search; it was more of a feature to reduce the “noise” of irrelevant searches (for that user) and improve usability and NOT a security necessity. Users sharing saved searches were not a concern as any user could only access data they already could access.

How Saved Search Permissions Work in Splunk 6

We had assumed that permissions worked the same in Splunk 6, but today I noticed that while the dashboards for our training data set were still loading, I couldn’t access the raw data.

Since this broke my understanding of permissions, I asked a co-worker to confirm that I was going crazy. While he was stunned at first, he mentioned that he had removed access to the training data last week; now that I had a lead on the change, I started tracking down why the dashboards still worked. In Splunk 6, a stored search executes with the permissions of the search’s owner. so while I couldn’t execute an ad-hoc search or utilize a dashboard with an inline search, I could access the data via a saved search created by a user who still had access.

Intended Behavior

I have received confirmation from Splunk that this is intended behavior; they compared it to the suid bit which I thought was an excellent way of concisely describing the behavior.

Example

  1. Create a saved search with a user who has access to the data
  2. I’ll assume a saved search with named foo of index=bar
  3. With a user who does not have access to the index with the data:
  4. Run the same search by hand (e.g. index=bar ) ; no data will be returned
  5. run the saved search by typing | savedsearch "foo" and this time data will be returned

Conclusions

While this is very exciting new functionality in terms of the possibilities, I don’t feel that there this has been clearly communicate to users nor the security implications completely thought through. Previously, when asked to audit permissions, looking at the users and roles was enough to determine who had access to what; now saved search permissions must also be considered. Similar to the assertion that ACL’s without deny cannot be properly audited, I’m concerned that this makes the evaluation complex enough that shops are going to inadvertently share data and not catch it even during an audit.

Just like setuid program open the door to unintended consequences, I’m also concerned that this opens the door to a whole host of vulnerabilities including injection attacks (similar to a SQL injection of foo' OR 1=1 --); previously, there was no concern as the user always had access to the underlying data anyways so there was no additional access to gain — that is until this game changer.

Heartbleed: Splunkweb Certificate Regeneration

Splunk has released Splunk Enterprise 6.0.3 which addresses the heartbleed vulnerability; all users of Splunk Enterprise 6.x should upgrade.

In my opinion you should upgrade your core infrastructure ASAP and then regenerate your private keys, generate new certificates, and once you deploy the new certificates revoke the previous ones.

While this isn’t a small task, its not particularly hard and most CA’s will re-issue certificates for free so the only cost is a soft cost of your time. While heartbleed attacks are easy to detect now that we know what to look for, exploitation doesn’t cause any unusual log entries so its virtually impossible to examine logs and tell if you have been exploited.

Generating a New Key

It is quite easy to generate a new RSA key:

openssl genrsa -out mykey.pem 2048

In the case of Splunk Web, you want to generate the key in $SPLUNKHOME/etc/auth/splunkweb and Splunk bundles a version of OpenSSL

$SPLUNKHOME/bin/openssl genrsa -out 2014Splunk1.key 2048

Building on Previous Work

Luckily our certificates for Splunk web expired last week, so I’m familiar with what I need to do to re-issue the certificates. The only difference is I need to generate new keys and then utilize the new keys when generating the CSR. In last week’s post, Splunk SSL Chain, I listed a command to easily generate and collect CSR’s for your search headers:

for server in splunk1 splunk2 splunk3 splunk4 splunk5; 
  do 
    ssh -t -t ${server} "sudo -u splunkuser -- sh -c '
      cd /opt/splunk/etc/auth/splunkweb; 
      [ -f ${server}.csr  ] && rm ${server}.csr ; 
      [ -f ${server}.pem -a ! -f ${server}.csr -a -f ${server}.key ] 
        && openssl x509 -x509toreq -in ${server}.pem -out ${server}.csr -signkey ${server}.key; 
      cat ${server}.csr'" 
      | tail 18 > ${server}.csr ; done

where splunk1 splunk2 (etc) is a space seperated list of your search heads.

Modify to generate a new key

Before generating a new certificate, add the line in that genereates the new key:

for server in splunk1 splunk2 splunk3 splunk4 splunk5;
  do
    ssh -t -t ${server} "sudo -u splunkuser -- sh -c '
      cd /opt/splunk/etc/auth/splunkweb; 
      [ -f ${server}.csr  ] && rm ${server}.csr ; 
      [ -f ${server}.pem -a ! -f ${server}.csr]
        && openssl genrsa -out ${server}2014.key 2048 
        && openssl x509 -x509toreq -in ${server}.pem -out ${server}.csr -signkey ${server}2014.key;
      cat ${server}.csr'" 
      | tail 18 > ${server}.csr ; done

I specifically do not delete the old key as it is currently in use with the certificate and I would like to continue utilizing Splunk’s web interface while I wait for our CA to sign the new requests.

To avoid overwriting the old key, I give the new key a different name; in this case I chose to append 2014 assuming I’ll remember that 2014 was when heartbleed occured.

Next Steps

Once the CA signs the certificates, I’ll put them in place (if you have errors, see my previous Splunk SSL Chain post about the order of certificate chains). I’ll then delete the old certificates and the old keys.

Splunk SSL Chain

On Monday March 31 my boss forwarded an e-mail stating the SSL certificates for our Splunk servers were expiring; my employer participates in InCommon’s SSL certificate program, so I thought no big deal generate a CSR (certificate signing request), fill out the form and upload the csr, wait for the e-mail, download it, restart splunk. Sounds like a lot of work but no sweet.

Generating CSR

We currently have six search heads so filling out all the information and ensuring I didn’t make a typo seemed like a daunting task (I’m a notriously bad typer). Luckily, a quick Google search turned up openssl’s x509toreq option which will generate a CSR based on an existing certificate.

Since its bad practice to access the web from a server and I needed to upload the CSR to the CA, I wanted to: 1. Generate a CSR 2. Copy it to my local workstation * the CSR doesn’t contain the private key, so confidentiality isn’t a concern; the primary security concern is in protecting the integrity of the request.

We run Splunk as non-root and the permissions on the private key and curent certificate pervent anyone other then that user (except root of course) from reading those files, so I needed to run sudo; if you’ve tried sudo generally prevents you from running multiple commands (there’s a good stackoverflow thread on it)

In the end I used (line breaks added for readability):

for server in splunk1 splunk2 splunk3 splunk4 splunk5; 
  do 
    ssh -t -t ${server} "sudo -u splunkuser -- sh -c '
      cd /opt/splunk/etc/auth/splunkweb; 
      [ -f ${server}.csr  ] && rm ${server}.csr ; 
      [ -f ${server}.pem -a ! -f ${server}.csr -a -f ${server}.key ] 
        && openssl x509 -x509toreq -in ${server}.pem -out ${server}.csr -signkey ${server}.key; 
      cat ${server}.csr'" 
      | tail 18 > ${server}.csr ; done

Note: For each server it will run sudo and you must type your password for that server. You will not see a password prompt as the output is swallowed by the tail command. If nothing happens, you may have missed typed your password and need to retype it. The first time is the first server, then wait to see a message about the connection being closed and you should be good to type the password for the sudo command on the next server.

ssh -t -t
-t forces pseudo-tty allocation; the second -t forces the allocation even if there is no local tty.
This was required in order to get sudo to prompt for my password; otherwise it sees no tty and won’t prompt
sudo -u splunkuser
runs the commands as splunkuser – you should adjust it to be whatever user you run splunk as
– sh -c ‘[…]’
this is the “hack” to run multiple commands in one invocation
– makes it so sudo will quit looking for arguments
sh runs the sh shell
-c tells ssh we’d like to run one “command” and exit
’[…]’ then needs to contain one “command group” – something that you could type on one line of an interactive shell and type enter. Thus you need to join commands with things like `&&` or `;`
[ -f ${server}.csr ] && rm ${server}.csr
Check to see if a csr exists and if so delete it; I added this as my first run had an error and I needed to re-run it for a couple of servers
[ -f ${server}.pem -a ! -f ${server}.csr -a -f ${server}.key ]
ensure that the server’s certificate exists (${server}.pem so splunk0.pem, splunk1.pem as appropriate), that no csr exists, and that the private key file exists
openssl x509 -x509toreq -in ${server}.pem -out ${server}.csr -signkey ${server}.key;
Generate a new certificate signing request (CSR) based on the existing certificate utilizing the key ${server}.key (e.g. splunk0.key, splunk1.key etc. as appropriate)
cat ${server}.csr
This takes the newly generated certificate signing request and sends it to stdout (which we’ll use next)
tail 18 > ${server}.csr
This reads stdin (which is the data from cat’s stdout) and sends the last 18 lines to the output
I’m a natoriously bad typer, so I’m probably going to typo my password. Since the certificate signing request is 18 lines, this throws away those password prompts and only sends the CSR on
`> ${server}.csr` taks the 18 lines from the tail and puts them in an appropriately named file (e.g. splunk0.csr splunk1.csr etc)

Installing the new certificates

So I uploaded and waited; today I finally got notification that the certificates were ready.

So I click the link for the “X509, Base64 encoded” version. I copied the certificate over, named it appropriately, restarted splunk. Despite Splunk’s claim that it started correctly, Splunkweb wasn’t responding to requests. Great!

Chacking the Splunk wiki, there’s an article on 3rd Party CA’s and under the Complex certificate chains section it states:

If you are using a certificate chain, you need to bundle the intermediate and the server certificate into a single certificate, by concatenating the certificates together (the right type, and in the right order)[…]

   -----BEGIN CERTIFICATE-----
   ... (certificate for your server)...
   -----END CERTIFICATE-----
   -----BEGIN CERTIFICATE-----
   ... (the intermediate certificate)...
   -----END CERTIFICATE-----
   -----BEGIN CERTIFICATE-----
   ... (the root certificate for the CA)...
   -----END CERTIFICATE-----

In checking our certificate bundle, it was in the opposite order.

Correcting the Certificate Chain Order

Luckily while searching how to get openssl to display the common name for more then one certificate I came across a stackexchange post which linked to a perl script which lists the common name and then will save them in seperate files (http://gagravarr.org/code/cert-split.pl). I started to modify it, but for various reasons switched over to python and wrote a script inspired by it.

Cert-Reorder

Update: On April 23, 2014 I removed the direct listing of the source of the script and moved it into its own github repository. Hopefully this makes it easy to make contributions, log issues (in case you don’t know how to code or don’t have time), and to track updates/changes.

Using Splunk’s Bundled Python

This script utilizes PyOpenSSL; the python bundled with Splunk includes this library, so it should be easy fairly easy to run. By default the script utilizes the standard python shebang line #!/usr/bin/env python which should invoke the systems version of python. It also includes an example on the third line of directly utilizing the python bundled with Splunk (assuming $SPLUNKHOME is /opt/splunk) #!/opt/splunk/bin/python If you remove the first two lines and modify /opt/splunk to point to your $SPLUNKHOME if required, it should work.

Usage

To reverse the certificate order in the pem file:

./cert-reorder.py -r certificatechain.pem

While that was my first attempt and worked fine for us — our CA always provides them in server, intermediate, CA order — this seemed potentially error prone. What if its ran multiple times, the order changes, etc. Since Splunk bundles PyOpenSSL it was fairly easy to implement a version that finds the root certificate and then moves along the chain to the server certificate.

To analyze the certificates and put them in CA, intermediate, server order regardless of input order

./cert-reorder.py -a certificatechain.pem

If you want to verify the order of the certificates after a command, you can add the -p option

./cert-reorder.py -pa certificatechain.pem

Misc. Filesystem Commands

This post is primarily a placeholder so I have some of the commands handy in case we ever need them again.

Background

Approximately half of the logs that are ingested into Splunk come into our envrionment via syslog. It shouldn’t be surprising that the health checks I’ve been working on were starting to indicate that I/O was becoming a bottleneck on our syslog server, so we re-purposed a fairly new server to be a new log server to share the load.

As part of this move, we’re utilizing CentOS as our server team has standardized on RedHat and while we still run our own servers, we wanted to try and at least be a friendly environment if they were to manage the servers down the road. With that change came the introduction of SELinux and thus a couple of quick notes.

A student worker (college students that we can employee fairly cheeply) did the install, my boss did most of the initial configuration. I started getting invovled when a disk filled up and we found out that the raid array for the OS install had been created but the larger one for data (logs in this case) hadn’t.

We could throw logs under /var/log and then everything should have worked automagically, but since this is a dedicated syslog server that’s a lot of logs so we mount the drive in a different location which we’ll call /mnt/logs for the purposes of this article (its arbitrary and likely to change so a generic place holder feels best).

File System Prep

Ext3 to Ext4

The original file system for the /mnt/logs partition was formatted as ext3; while there is nothing wrong with that, most of our other partitions are using ext4 so I went ahead and converted it for consistency’s sake.

sudo tune2fs -O extents,uninit_bg,dir_index /dev/sdb
sudo e2fsck -fDC0 /dev/sdb

Reserve only 5 inodes for superuser (vs 5%)

By default 5% of the filesystem is reserved for root; this is a really nice feature as if the drive “fills up” it ensures the administrator has some wiggle room to use while freeing up space. On a partition dedicated to logs, we don’t expect any files to be used by root (so this amount will likely be completely available) and as this is a large partition 5% is quite significant, so we lower this to 5 inodes.

sudo tune2fs -r 5 /dev/sdb

Find the UUID for the drive to create the /etc/fstab entry

I really like the idea of using UUID; since we’re utilizing drives in bays physically in the server’s chasis, I wouldn’t expect the order to change, but UUIDs are easy enough .

sudo blkid /dev/sdb

SELinux

Display labels:

ls -dZ

Change the Label:

sudo chcon -R -t var_log_t /mnt/logs

Change the Stored Label (so relabeling doesn’t break syslog)::

semanage fcontext -a -t var_log_t /mnt/logs

Run a Test Restore to Validate Above Change:

restorecon -R -v /mnt/logs

Updating Files

rsync -av '$(find /mnt/logs -type f -ctime -1)' /mnt/new_logs

Links

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Security-Enhanced_Linux/sect-Security-Enhanced_Linux-SELinux_Contexts_Labeling_Files-Persistent_Changes_semanage_fcontext.html

Splunk Indexer Affinity

How much data does each forwarder send to each indexer (i.e. How do we measure indexer affinity)?

Based on our conversations with people in Splunk’s support group, the best way to measure indexer affinity — that is if the forwarders are doing a good job randomly selecting an indexer or if they are “getting stuck” on one or only selecting from a subset — is to look at the volume of data each forwarder sends to each indexer. Over a long period of time — where long is defined by how even the data generation is, how many indexers you have, and how frequently the data is being generated — the distribution of data over the indexers should be fairly even.

Why do we care about indexer affinity?

Obviously the first question when discussing how to measure it, is why is this important?

  1. It affects capacity planning
  2. It affects data retention
  3. It affects how quickly searches return

Capacity Planning

The easiest reason to explain is capacity planning. Indexers are where splunk stores the data sent to it so if you have 20 indexers each with 100GB for Splunk data, then you assume you can store approximately 2,000 GB of data. If the forwarders are sending the data unevenly, then one indexer will run out of room sooner causing issues.

Data Retention

Splunk stores data in what they call bucket’s; a bucket can only be discarded when its most recent event exceeded the retention period. If you have uneven data distribution, then the indexer receiving more events will roll buckets more often, causing it to have a smaller time span and thus its buckets will expire sooner. The indexer that receives less data will have longer time spans and thus it will take longer for the most recent event to pass the retention period. In our case, we have an extremely diverse environment, there are some time issues, etc so we know we will have some un-eveness in our time spans, but we want to minimize it.

Search Speed

Splunk uses map-reduce; basically when a search is ran, the search head parses the search and creates a job which gets distributed to each indexer. Each indexer conducts a search on their data and then sends the results to the search head which compiles the results and performs post processing. If one source type is only generated by a few hosts and the forwarders on those hosts only send to a few indexers, then when the job goes out to the indexers, only a few of them can search the data and they have more data to search. More succinctly, Splunk scales horizontally, bad indexer affinity means your horizontal scale is smaller.

Query: How much data does each forwarder sent to each indexer?

1
2
3
4
index=_internal source=*metrics.log group=tcpin_connections | 
rename host as indexer hostname as fwder |  
stats sum(kb) as total by indexer fwder |
sort -total
  1. Search Splunk’s metrics log and filter to only incoming connections.
  2. Rename host to indexer and hostname to fwder just to make the rest of the search easier to read
  3. Use the stats command to sum up the amount of transmitted data per indexer and forwarder
  4. Sort it by total in descending order

If you look at the results it should look something like:

indexer fwder total
index1 fwder1 1000
index2 fwder2 997
index2 fwder3 998
index1 fwder4 967
index2 fwder5 109
index1 fwder4 13
index1 fwder2 9
index1 fwder3 6
index2 fwder1 4
index1 fwder5 3

First Post

I guess I get to have the first post for the second time in my life — the first being on the original version of this blog. I’m working on reviving my blog after a couple year hiatus. Unforutnately, I no longer have copies of my previous blog posts and I’ve migrated from wordpress to Jekyll so I still have a bit of work to do before it is decent shape. With those cavaets, I plan on continuing writting primarily about various technologies that I come across in my professional and personal life.

During the day, I’m primarily focused on Splunk, so many of my posts will probably feature issues related to splunk. I’m try to even it out with other topics and hopefully have a bit of fun and keep things interesting.