Elasticsearch: Round Two

So round two of Elasticsearch fun, my previous attempt was to use users and roles to secure Elasticsearch but that didn’t work as I’d hoped….

My plan was to use nginx to proxy reqests to elasticsearch. I setup a nginx box with an elastic IP and then configured elasticsearch to only allow access from that IP. This worked great but now I had a single point of failure. So I setup a ELB in front of a pair of nginx boxes and updated my elasticsearch config with both nginx EIP’s. But this was now feeling clunky and we were going to have sevearl elasticsearch clusters that would need my new nginx infastrcture.

So I chatted with one of our AWS consultants. He came back to me the next day and told me he had never tried this but what if I added elasticsearch IP’s to my nat gateways and then configured elasticsearch’s access polices to only allow access from those EIPs. This seemed way too simple in my book (after I have now spent several hours on my nginx setup) but I just completed my proof of concept and it just worked. So now my test elasticsearch can only be accessed by my VPC and is secured from the outside world.

Securing AWS ElasticSearch

So this was a fun problem, ElasticSearch on AWS in it’s current form is completely public and not secured within my VPC. It appears that first you have to put in a deny statement to lock off access and then you add in the roles / users that have access. But once you do this you now have a box that can’t be accessed by CURL… very annoying. So now you need to write a python (or language of your choice) script to send a signed request. Here’s the code I’m working on:

import requests
import json
from requests_aws_sign import AWSV4Sign
from boto3 import session
from elasticsearch import Elasticsearch, RequestsHttpConnection

import sys, getopt

host = ''
json_map = ''
index = ''
 opts, args = getopt.getopt(sys.argv[1:],"h",["host=","json_map=","index="])
except getopt.GetoptError:
 print '--host <ES HOST> --index <index name> --json_map <ES Mapping>'
for opt, arg in opts:
 if opt == '-h':
 print '--host <ES HOST> --index <index name> --json_map <ES Mapping>'
 elif opt == "--host":
 es_host = arg
 print es_host
 elif opt == "--json_map":
 json_map = arg
 print json_map
 elif opt == "--index":
 index_name = arg
 print index_name

with open(json_map) as data_file:
 request_body = json.load(data_file)

# Establish credentials
session = session.Session()
credentials = session.get_credentials()
region = session.region_name or 'us-west-2'

# Elasticsearch settings
service = 'es'
auth = AWSV4Sign(credentials, region, service)
es_client = Elasticsearch(host=es_host,

# print es_client.info()
 res = es_client.indices.create(index=index_name,body=request_body)
 print(" response: '%s'" % (res))
except Exception as e:
 print("Not working...")

My developers can now commit their JSON to git to create their index mappings and it can be auto created by Jenkins.

=== UPDATE ===
Doesn’t really work, it appeared that I had a solution but finding that it’s 90% there.. Leaving this post here for now in hopes that it might be useful at some point.

The restaurant at the (other) end of the universe — Rusty Experiments

After a long day, there is nothing like a fun cocktail and this is a post I started last year and never posted. I’m a big fan of whisky and a friend of mine sent me his blog posting on a few of his favorite drinks. I’m finally posting this in hopes that I’ll actually make it out to the store to grab what I need to make these cocktails:

The restaurant at the (other) end of the universe — Rusty Experiments.

Grovy Jenkins

So if you are like me you’ve been searching the web for how to write Jenkins DSL and you’ve realized this is a larger and more vague task then you realized. Jenkins DSL (Groovy) sucks, make peace with that now and the rest will be easier.  You may have stumbled across the Jenkins DSL Plugin and started messing around, you may have missed (like I did) that there is a DSL Plaground that you can paste your code into and test it out. I went a step beyond that and have since built my own docker container that I can run my code in as I wasn’t 100% happy with posting my code into someone else’s app. I’ll post that docker link when I eventually post the docker container to the repo.

But you’ll eventually hit the roadblock of a plugin not supported by the DSL, this happens fairly quickly and you have to resort to the “configure blog”:

Now the best way to figure this out is to configure the plugin via the Jenkins GUI and then open up the job config.xml and then you can reverse engineer the groovy code.

Unfortunately it’s very much trial and error but using the config.xml + the DSL playground you can get about 75% there and then the rest is just iteration.

RDS Aurora Scaling

This week we found ourselves in the wonderful but awful spot of having a lot of traffic hitting our DB servers. Its good because we are getting our customers moved over to AWS but bad since that means we need to scale up our RDS instances. After some panicked moments and a couple of hot fixes, we disabled some batches that were pushing our taxed DB servers to the limit and causing micro outages and I started researching the best way to scale up RDS clusters. Most of the info out there was a little vague so I contacted AWS support and here is the info they gave me:

Of course after reading through this I realized how simple it was and maybe that’s why there isn’t a more detailed guide on how to do this. Hopefully this post helps the next guy who needs to do this and has the office waiting on him to figure it out.

Apache 2.4 + Opcache + APCu = 502?

So last week we had a puzzling bug relating to PHP and it’s modules. We have the following setup on AWS:

Ext ELB -> Nginx (rate limiting, multiple SSL termination) -> internal ELB -> Apache / PHP

We have multiple stores on these apache boxes along side our admin interface and what was puzzling was we started getting 502 errors on just our admin interface as we scaled up the number of EC2 instances behind the internal ELB. After several days of tinkering and trying many different ideas. We stumbled on a blog posting (sorry I didn’t bookmark it so I can’t give credit) that talked about turning off these two settings in /etc/php.d/opcache.ini:

I don’t exactly know why this worked but it fixed our issue and I wanted to post it incase someone else was having a rough week with 502 errors.

Zabbix Warning: Less than 25% free in the configuration cache

Because I always forget how to fix this issue and have to research it every time I come across it, I thought it was time to blog about it. So this morning I got the error on our new-ish zabbix server that our config cache was getting full. This didn’t surprise me since we’ve been adding lots of devices over the past few months. So I jumped into the zabbix server config file and quickly update the cache setting to give myself some room. I restarted zabbix-server and… the warning did not go away.

Checking the logs, I see this error:

NOTE: the following guide works on Ubuntu 14.04 server… haven’t tested anywhere else.

It finally dawned on me that I had to change something in the /etc/sysctl.conf:

This gives you 128Mb of shared space

To apply changes in the /etc/sysctl.conf immediately, execute:

Now when I restart zabbix-server again everything is happy again and there is plenty of room to grow.

Fore more info and guides for other OS, check the official zabbix page