0 comments on “Enabling AWS X-Ray on AWS Lambda”

Enabling AWS X-Ray on AWS Lambda

As you have probably noticed, debugging and getting latency data for your microservices can be painful if they interact with multiple distributed services. For these types of microservices, you are usually forced to build your own performance testing application, add an inordinate amount of log statements, or simply crossing your fingers and hoping for the best. From one of AWS’s posts on the subject:

“Traditional debugging methods don’t work so well for microservice based applications, in which there are multiple, independent components running on different services.” – AWS Lambda Support For AWS X-Ray

As a result, AWS built AWS X-Ray which, according to them, solves this problem:

“AWS X-Ray makes it easy for developers to analyze the behavior of their distributed applications by providing request tracing, exception collection, and profiling capabilities. ” – AWS X-Ray Documentation

Back in December, AWS announced a preview release of AWS X-Ray. While this was great and awesome, if you were a serverless shop and used AWS Lambda you were still out of luck.  Fortunately, in May ’17, AWS Lambda support for AWS X-Ray was released. Instrumenting your app has never been easier.

Below, I will go through the steps to update your CloudFormation template and instrument a Java application. We will then take a quick tour of the reporting and search features of the X-Ray dashboard.

Update CloudFormation

While we could enable X-Ray via AWS console, it is always better to have your application be fully deployable with a push of a button and a stack definition. On June 6, 2017 AWS CloudFormation released the TracingConfig property, that, along with a permissions change enables AWS X-Ray on your Lambda function.

Step 1: Enable TraceConfig

In your Lambda resource, you will add a new property called TracingConfig with the mode set to Active.

You will also add a DependsOn field for the execution role as the Lambda service checks permissions as soon as CloudFormation creates the Lambda function.

*Note: The default TracingConfig mode is Passthrough. This means that if any other service that has the Active mode enabled, your Lambda function will send tracing information to X-Ray. But if you access your Lambda function directly or through a service, that does not have X-ray enabled it will not send tracing information.

  Type: "AWS::Lambda::Function"
    Handler": "demo.XRayLambda::handleRequest"
    Role: !Join ["", ["arn:aws:iam::", !Ref "AWS::AccountId", ":role/", !Ref roleLambdaExecutionPolicy ] ]
    Description: Cloud formation created lambda for demo-xray-lambda
    FunctionName: demo-xray-lambda 
    MemorySize: 128
    Timeout: 140
      S3Bucket: my.awesome.bucket.lambda.us-west-1
      S3Key: demo-xray-lambda/demo-xray-lambda-1.3.2.zip
    Runtime: java8
      Mode: Active
  - roleLambdaExecutionPolicy

Step 2: Add AWS X-Ray permissions

Next we need to give our Lambda function permission for the xray:PutTraceSegments and xray:PutTelemetryRecords capabilities. Here I have added a new statement to my inline policy in the execution role.

  Type: "AWS::IAM::Role"
     Version: "2012-10-17"
     - Action: "sts:AssumeRole"
         Service: lambda.amazonaws.com
       Effect: Allow
      PolicyName: demo-xray-lambda-policy
        Version: "2012-10-17"
        - Action:
          - "xray:PutTraceSegments"
          - "xray:PutTelemetryRecords"
          Effect: Allow
          Resource: "*"

There are a couple of issues that can trip you up. First, IAM is a global service. As such, when you create a new role, it needs to be propagated to all regions. There is a possibility that your role has not been propagated to your stack’s region by the time CloudFormation starts to create your Lambda function. The Lambda service will throw an exception, and the stack will fail to create if it doesn’t have xray:PutTraceSegments permission. To get around this, you can either make your policy inline in your role resource, have two separate stacks for execution role/permissions and for your Lambda function, or reference an existing managed policy. I made my policy inline and have yet to run into an issue.

Another issue is when you have an existing stack/role that you want to add the X-Ray permissions to and enable TraceConfig in the same changeSet. This fails 100% of the time. Instead what you will need to do is rename your role resource so that it creates a brand new one instead of updating the existing one. As I mentioned with the previous issue you need to have your policy inline instead of as a separate resource. You should also add a dependsOn condition to your Lambda function to avoid parallel updates and ensure it will run/complete the role before creating the Lambda resource.

Instrument The Application

We will now start instrumenting our application by adding the necessary X-Ray libraries as well as adding a few lines of code to add more color to the traces. These libraries give you the mechanism to create your own custom segments to measure the performance of a subsection of your code. They allow you to add annotations which are indexed and enable you to search for subsets of your traces. They also allow you to add metadata to your subsegments which you can use for further debugging. For more information on how to instrument your application, please review the developer’s guide found here: http://docs.aws.amazon.com/xray/latest/devguide/xray-sdk-java.html

Step 1: Add the AWS SDK to your application

The next thing we need to do is import the AWS X-Ray SDK so that we can start getting traces into our X-Ray Service Map. Update your build.gradle, to add aws-xray-recorder-sdk-core and a few other libraries into your dependencies.

dependencies {
  compile 'com.amazonaws:aws-xray-recorder-sdk-core:1.1.2'
  compile 'com.amazonaws:aws-xray-recorder-sdk-aws-sdk'
  compile 'com.amazonaws:aws-xray-recorder-sdk-aws-sdk-instrumentor'

At this point you could theoretically stop. You can push your code and you will begin to see traces in your X-Ray Service Map of your AWS::Lambda and AWS::Lambda::Function with a subsegment of Initialization. Per AWS documentation, this is because “the AWS SDK will dynamically import the X-Ray SDK to emit subsegments for downstream calls made by your function.” But wait, there is so much more we could be doing here.

Step 2: Add Custom Subsegments

Now let’s say that our function does a few things; download an S3 image, do some image manipulation then push it back up to S3.

public void handleRequest(String key, Size size) {
  Image image = downloadImage(String key);
  Image thumbnail = resizeImage(image, size);

Because you imported the aws-xray-recorder-sdk-aws-sdk-instrumentor you will automagically get subsegments for the S3 API calls. You could, however, create a your own custom subsegments for the image manipulation portion. Like so:

import com.amazonaws.xray.AWSXRay;
  public void resizeImage(Image image, Size size) throws SessionNotFoundException {
    // wrap in subsegment
    Subsegment subsegment = AWSXRay.beginSubsegment("Resize Image");
    try { 
      Image resizedImage = image.resizeMagic(size.getWidth(), size.getHeight());
    } catch (Exception e) {
      throw e;
    } finally {

You will now see a subsegment for resizing the thumbnail.

Step 3: Add Annotations to your Subsegments

So now that you have the custom subsegments, how do you know which one is which for, let’s say, large thumbnails versus small thumbnails? In comes annotations, which allows you to query your reports for a subset of your traffic.

*Note: you can only add annotations to subsegments, and not the root segment. I have seen where some people create a subsegment for the length of their handler, to which they add annotations and metadata, and then subsegments for the different subsections of that handler.

Simply updating our above code to this will give us this ability.

  public void resizeImage(Image image, Size size) throws SessionNotFoundException {
    subsegment.putAnnotation("Size", size.toString());
    Image resizedImage = image.resizeMagic(size.getWidth(), size.getHeight());

Step 4: Add Metadata to your Subsegments

Additional useful tooling you can add is metadata to your subsegments. This can help you debug traces, for example, that have exceptions. In our image resizing example we could add things like image source size, or file type. That way when reporting on traces with exceptions we can drill down and see if we can narrow down root cause.

subsegment.putMetadata("source", "size", image.getSize().toString());
subsegment.putMetadata("source", "fileType", image.getFileType());

Reporting on your application

Ok, now that we have enabled X-Ray and instrumented our application it is time to head over to the AWS UI and start learning about our application.

Service Map

After accessing your application a couple times, head over to the X-Ray dashboard in your AWS console. Make sure you are in the region where you deployed your microservice. You will start off on the Service map page. Here you will see something like the below with all the functions that have had hits in the last 5 minutes:

Screen Shot 2017-07-25 at 11.24.57 AM

There are a couple things to note on this page. There is a search bar that you can use to filter requests either by service name, annotations or trace id. You can also change your time range to anything from the last 1 minute to the last 6 hours. Or you can put a specific day, a start time and the length of time which again can be anything from 1 minute to 6 hours.

You can also click on a given bubble in your service map and see additional details as well as filter by response type, fault or throttling.

Screen Shot 2017-07-25 at 11.33.01 AM

Indexing on annotations

At this point let’s take a look at filtering with annotations. In the status bar let’s type in the below:

service(id(name: "demo-xray-lambda", type: "AWS::Lambda")) { annotation.Size = "small" }

Then let’s change it to medium and we will see a slightly higher response time.

service(id(name: "demo-xray-lambda", type: "AWS::Lambda")) { annotation.Size = "large" }

Let’s take that a step further and see source images that are greater that 2 mb.

service(id(name: "demo-xray-lambda", type: "AWS::Lambda")) { annotation.ImageFileSize > 2 }

As I’m sure you are starting to notice, is that with this level of instrumentation and granularity you can start to get a  better understanding of your application’s response times, where some of your pain points are, and what you can improve on.

Drilling deeper

Now that we took a bird’s eye view of performance of our application as a whole, let’s drill down deeper into individual traces. You can get there by clicking on View Traces in your Service Details panel or by clicking in the left navigation on Traces:

Screen Shot 2017-07-25 at 11.49.55 AM.png

Here you will see all the requests that X-Ray chose to sample. You can click on an individual trace by clicking on its ID. This could look something like the below image depending on your application.

Screen Shot 2017-07-31 at 1.05.16 PM

Here you can see each subsegment, its response times, and at what point in your service response time it executed. Also, if you click on a subsegment that, for example, you added annotations or metadata to in your code, you will get a popup panel that will allow you to view that data.

Screen Shot 2017-07-31 at 1.10.05 PM

One use case for this, is to filter by error or fault and then click in the subsegments where we added source image data to the metadata to get a better idea on where the source of the problem is.

service(id(name: "demo-xray-lambda", type: "AWS::Lambda::Function")) { error = true }


As I’m sure you have probably noticed, with very little investment you can get pretty powerful visibility into your distributed application’s performance. AWS has simplified this process to the point where debugging, tracing requests and viewing the performance of a collection of service in one view can happen with just a few lines of code and a few clicks.

I hope this simple getting started guide gets you up and running. Let us know in the comment section below if you find this helpful and any suggestions or questions you may have.

0 comments on “Disaster Recovery Using Hybrid Cloud”

Disaster Recovery Using Hybrid Cloud


Financial Engines recently celebrated the 20th anniversary since the company was founded.  Those two decades reflect our growth en route to becoming the largest registered investment advisor in the US.
During those same two decades the technology industry has changed profoundly and we have adjusted along the way.  One change we completed earlier in 2016 was moving our disaster recovery footprint to a hybrid cloud solution using AWS. This document describes that effort in more detail and the results we achieved.

Moving to IaaS

Our offerings have been web-based since inception. For hosting these web experiences we utilized top tier colocation providers. That relieved us from building and operating physical datacenters.
Today enterprise capable IaaS is avaiable from providers such as AWS.  We are now on a journey to move “up the stack” and adopt IaaS and reduce the burden we bear for things like:

  • hardware (servers, network gear, storage) procurement
  • physical site design and engineering (space, power, cooling, rack design, cable management, etc)
  • hardware maintenance: replacing failed drives, DIMMs, CPUs, NICs, motherboards, switch blades
  • firmware maintenance: qualifying and applying updates/patches across all hardware devices
  • hypervisor work: licensing, installation, tuning, maintenance, patching, upgrades
  • physical storage: design, engineering, and maintenance for iSCSI boot and data disks and NFS/CIFS network attached storage

Moreover, when we are ready to decomission infrastructure we just call an API to terminate/free those resources. This eliminates physical maintenance at the end of hardware lifecycles.

Peak Colo

Our transition to IaaS marks early 2016 as the point of “Peak Colo” for Financial Engines.

Over the coming quarters we expect to:

  • require fewer racks in colocation facilities
  • buy fewer servers from Cisco, Dell, or IBM
  • consume fewer VMware licenses
  • spend more on AWS for IaaS resources
  • achieve a net savings in our infrastructure total cost of ownership (see chart below for details)

Lift and shift for DR

Our rebuild of the DR environment had a fixed timeline due to a colocation contract ending. We therefore focused our effort on a lift-and-shift approach and moved the Linux compute portion of our stack into a VPC. We connected that VPC using Direct Connect to a reduced colo footprint resulting in a seamless LAN spanning our colo space and AWS:


This hybrid posture converts roughly 80% of our servers from on-prem hosted to cloud-hosted.  In doing so we trade capital for expense and ownership for rental.

For disaster recovery this trade is attractive since these resources are rarely needed (our DR utilization is < 10% for testing, drills, etc).

This lift-and-shift hybrid project has a residual footprint in our colo consisting of:

  • backend NetApp storage
  • large database hosts which are more diffcult to run on EC2 (due to size, iops, and cpu requirements)
  • batch machines which currently run on Windows Server

Future revisions of our hybrid posture should enable more of this infrastructure to run on AWS.

Our previous generation disaster recovery consisted of a colocation-hosted footprint containing:

  • 6 racks
  • vmware compute on IBM blades
  • NetApp storage
  • RHEL subscription fees
  • Load balancers as hardware appliances

The new disaster recovery footprint built on a hybrid cloud consists of:

  • 1 rack of UCS and NetApp (tech refreshed to yield better density and performance)
  • EC2 compute (our upgrade to the latest Xeon E5 v3 hardware was just selecting from the M4/C4 instance families)
  • Ubuntu 14.04 LTS
  • Load balancing on ELBs

In terms of costs here is what the transformation looks like:


Our DR site on AWS uses the pilot light model which incurs a modest monthly expense.

In exchange for that pilot light expense we achieved large reductions in depreciation, engineering time, and colo expense.

Looking Ahead

Following this disaster recovery rebuild we are moving to other re-hosting projects such as:

  • dev and test environments
  • production, starting with cpu-intensive tiers of our footprint

We expect these new environments to utilize the same hybrid cloud architecture with similar results.

Related Work

In addition to our lift-and-shift projects we are also moving to cloud native substrates for net new functionality.

These projects are using high-level primitives in AWS such as:

  • Lambda
  • API Gateway
  • DynamoDB
  • S3
  • Kinesis

Look for future blog posts covering that work.


8 comments on “AWS Lambdas with a static outgoing IP”

AWS Lambdas with a static outgoing IP

Take a spin around the technical universe, and you will see that serverless computing is all the rage these days. Serverless computing doesn’t mean that there are no servers running your code. In the most popular use of the word, it simply means that you, the developer, don’t have to worry about it. Someone else has, and will monitor your service and make sure you have the right infrastructure and scalability in place.

Public Cloud providers like AWS and Google are simplifying the process for developers to leverage this architectural design concept. Do a quick search on the “serverless” keyword and the most popular related topics are in fact AWS with IoT being a close second.

Screen Shot 2016-07-03 at 1.00.04 PM


For software developers, serverless computing opens a world of possibilities as well as new security concerns. One of those concerns is how to handle whitelisting your service’s IP address for third party APIs on an infrastructure where IPs change quite frequently. For example, AWS released a post on this very subject & REST APIs where you can see what the IP ranges are at a given moment, saying:

You can expect it to change several times per week…

So, should we then specify a range of IPs in the API whitelist? Well, that would basically allow all of AWS to hit that third party API (not to mention some third party apis do not allow for a range). Not what you want, right?

Google’s implementation of serverless computing comes in the form of Google Cloud Functions, which was released in February 2016. At the time of this article, it is still an Alpha release and there is currently no way to define a static outgoing IP address. AWS’s implementation of serverless computing, called AWS Lambda functions has been in the wild for over a year now. As of February 2016, your Lambda functions can now access VPC resources. What does that mean for us? Simply put, we can now put them in a private subnet in our VPC and in essence assign static outgoing public IP addresses to them!

As a POC of this feature I decided to have a little fun with my latest game addiction, Clash of Clans. Over the next couple paragraphs I’ll walk you through how I configured my AWS Lambda behind a static public IP address, to then hit Clash of Clans’ public APIs.

Architectural Design

For this project we will need the following resources:

  • A VPC with:
    • Public Subnet
    • Private Subnet
    • NAT Gateway
    • Elastic IP
    • 2 Routes (public/private)
    • Internet Gateway
  • Lambda
  • API Gateway


Following the digram, at a high level, this is what we need to do and, what these resources will do for us. First we will create a new VPC.

Next we will create 2 subnets. When you initially create a subnet it will get a default route and are both basically private subnets. Since we want one of these subnets to be public, we will create an Internet Gateway, and a new route that points all traffic to this gateway which we will then assign to our public subnet. Now any subsequent resources created in our public subnet will automatically get internet access, and as long as it has a public IP it will be publicly accessible to the outside world.

Then we will create a NAT Gateway in the public subnet. Its job is to provide internet access to resources in our private subnet. It will need a public IP which EIP (Elastic IP) will provide us with. At this point we will update our default route table (assigned to our private subnet) to route all web traffic to our NAT Gateway.

The last service we will need to configure is our AWS Lambda service. By using the microservice-http-endpoint blueprint, we will create a function that is publicly accessible with API Gateway. It will live in the private subnet of our newly minted VPC so that we can leverage the outgoing elastic IP address. The code will be very simple. It will make an authenticated HTTPS call to the Clash of Clans API and return the JSON object of the top ten international clans.

Resource Creation

Step 1: Create a new VPC

Head over to your AWS VPC dashboard and click on over to your list of VPCs. If you have never done anything with VPCs you will see a default VPC that AWS gives you out of the box. Click on the Create VPC link and enter in a meaningful name for you VPC. For example I used:


Step 2: Create 2 Subnets

Now we are going to go to the Subnets page and create two subnets. One public and one private. (For availability purposes you would want to have multiple private subnets in different availability zones for your lambda to run on. For simplicity sake we will stick to one here) In the Subnet tab click on “Create Subnet”. For the name tag, make sure to include “Private subnet” in one and in the other “Public Subnet,” choose our newly created VPC, and select an availability zone (us-west-2c for example). For CIDR block use the same IP Range as your public subnet, but increment the 3rd octet by 1 from the highest number in your subnets in the same VPC. For example:



Step 3: Create an Internet Gateway

Next we will head over to the Internet Gateway view, click on Create Internet Gateway and tag it with a descriptive tag.


Then, we will click on our new internet gateway, and click on Attach to VPC, to attach it to our newly minted VPC like this:


Step 4: Create a public Route Table and Assign it to our public route

Now that that is done we can head over to our Route Tables view and click on Create Route Table, giving it a descriptive tag and linking it to our VPC:


Then we need to edit this route to point it to our new internet gateway. Click on the new route, click on the Routes tab, and click edit. Then add a new route, and we will set all traffic ( to target our internet gateway and save it:


Now, click on Subnet Associations tab, click edit and, by ticking the check box by your public subnet and clicking Save, you will associate this new route to your public subnet.


Step 5: Create a NAT Gateway

First, take note of your public subnet’s id. You can see in my previous screenshot that it is subnet-8225a8da. Head over the the NAT Gateway view and click on Create NAT Gateway. On the creation screen go ahead and paste in your subnet id and click on “Create New EIP.” For example here is my new NAT Gateway with public IP of


On the confirmation screen copy your nat instance id and let’s go back and edit our default route created when we created our VPC. Click on the default route (you will see the Main column for that route says Yes), click on the Routes tab, and click edit. Then add a new route, and we will set all traffic ( to target our nat instance id and save it:


Lambda and API Gateway Configuration

Ok, now that our VPC is configured we can head over to Lambda and create/configure our new function.

Step 1: Create a new Lambda Function

On the Lambda Dashboard, click on Create a Lambda Function. On the first page, called “Select blueprint,” select the microservice-http-endpoint. This will then prompt you for API Gateway configuration options as well as Lambda configuration options.

Clicking next, I then configure the trigger (API Gateway options) giving it an API name of TechBlog-Lambda-IP, a resource name of /top10ClashOfClans, set the method type to GET and deployment to prod. Lastly, for the purposes of this demo, I’m setting the Security to Open. (Note: In the real world you wouldn’t want to do this, instead you would want to use either IAM, Open with access key, or implement CORS).


Step 2: Configure our Lambda Function

On the next page we then configure our Lambda function. First, I give my function a name (e.g. topTenClashOfClans), select Node.js as my runtime and after selecting “Edit code inline” for the code entry type, I paste in the below code (NOTE: ideally your key doesn’t reside as clear text in your code, instead you can leverage KMS encryption, but that’s a post for another day):

'use strict';
var http = require('https');
console.log('Loading function');

exports.handler = function(event, context) {
  console.log('start request to ' + 
    "https://api.clashofclans.com" +
  var options = {
    "method": "GET",
    "hostname": "api.clashofclans.com",
    "port": null,
    "path": "/v1/locations/32000006/rankings/clans?limit=10",
    "headers": {
      "authorization": "Bearer SUPER_SECRET_KEY",
      "cache-control": "no-cache"

  var req = http.request(options);
  req.on('response', function(res) {
    var chunks = [];

    res.on("data", function (chunk) {

    res.on("end", function () {
      var body = Buffer.concat(chunks);
      console.log("Got response: " + body.toString());
  req.on('error', function(e) {
    console.log("Got error: " + e.message);
    context.done(null, 'FAILURE');

  console.log('end request to ' + 
    "https://api.clashofclans.com" + 

Below the code block you will now need to create a role and configure your VPC settings. I selected our newly minted VPC along with our private subnet. For example:


Below that I also selected the default security group. (Note: in production you would want to have this tightened down a bit more. Like for example, only allowing outgoing and inbound traffic via HTTPS.) Finally click next, verify your details and click on Create function.

So we did quite a lot of configurations between our Lambda service and in our VPC, but it is important to note that this was all done manually to better understand the interconnectivity of each resource. Ideally you would instead use something like AWS CloudFormation, Terraform by HashiCorp, etc. where you can spin up your complete stack or even subsequently destroy it with one click.

Clash Of Clans API configuration

Hopping on over to the Clash of Clans developer portal, I now need to tell them about my new IP address as well as download my auth key.

Step 1: Create a Key

To create a key, I need to give my key a name, and description and tell them the IP address I’ll be using. (At this point you might want to create separate keys for each environment and use your API Gateway configuration to tell your Lambda service what environment it is running in and therefore which key it should use.) So for example their UI looks like this:


Step 2: Get Authentication Token

Upon clicking Create Key I now get my token which I’ll update my lambda code with:


Testing it out

Now that my VPC has been configured, my lambda function is configured, Clash of Clans now knows my IP and I got my super secret key, I can know test out my API. Head over to the triggers tab of your Lambda service and you can see your API Gateway url. This is the url you will call from your application:


If I head over to my browser and paste it in. Here is my snazzy JSON response from my lambda service, from Clash of Clans:

         "name":"Kings Rock",
         "name":"MEGA EMPIRE",
         "name":"HOUSE of CLOUDS",
         "name":"Come & Take It",
         "name":"GULF KNIGHTS",
         "name":"kurdistan is 1",
         "name":"BRASIL TEAM",
         "name":"FACÇÃO CENTRAL",
         "name":"Los Inmortales",
         "name":"Req and Leave",

In Conclusion

So there you have it. The fact that our API call returned successfully, proves that the Clash Of Clans APIs where able to verify that 1) we called from the IP we said we would call from, 2) we used the token they created for us, and 3) we made our call via SSL.

Granted, there are definitely quite a few shortcuts we took in this implementation where security could be tightened up. This is in no way a productized implementation. It is, instead, an over simplified POC on demonstrating the new relationship between AWS Lambda and AWS VPCs. We have proven that we can use AWS VPC infrastructure to configure a AWS Lambda to use a static outgoing IP. This allows for tighter security when locking down who has rights to access your APIs. In our business case we can now say that our microservices connecting via SSL, using a security token X, as well as, a calling from IP X.X.X.X can access our financial resources is a fully trusted consumer, and any other connection is blocked from accessing those same resources.

Feel free to take a spin with the above instructions and provide any comments or feedback on this implementation.

0 comments on “Integrate with Ease – AWS + Twilio, Slack and IoT”

Integrate with Ease – AWS + Twilio, Slack and IoT

Amazon Web Services (AWS) is changing the way engineers develop solutions.  It is so easy to prototype and have a scalable architecture with very little hand holding from the Office of the CTO or Systems Engineers.  This, in turn, fosters the DevOps culture within the organization.

One of the prototypes we’ve tried is to integrate AWS with Twilio, Slack and Intel Edison + Grove IoT device.  We cannot take all the credit here because this was inspired by a recent trip to AWS’s San Francisco pop-up loft.  They had a zombie apocalypse themed workshop but we took it a step further and used our company use cases and dissected each step since when we were there, we were just going through the motions.  We also thought that learning about how some of these technology companies work will provide us a fresh new perspective on what is moving and shaking the software industry.

The discussion below assumes some AWS knowledge of some of the services used.  It does not elaborate on each service and explain it in detail but here’s a quick refresher taken from AWS themselves:

  1. Lambda – “Lets you run code without provisioning or managing servers. You pay only for the compute time you consume – there is no charge when your code is not running”
  2. API Gateway – “Makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale”
  3. SNS – “Pub-sub Service for Mobile and Enterprise Messaging”
  4. Dynamo DB – “Fast and flexible NoSQL database service”

Let’s get right to it, shall we?  First up…


Twilio is a technology company that allows programmable communications.  They are the interface for sending and receiving global SMS and MMS messaging from any app.

Who uses Twilio?

  1. Box
  2. Nordstrom
  3. OpenTable
  4. Intuit
  5. Uber
  6. EMC2
  7. Zendesk
  8. Cocacola

Main competitors:

  1. Nexmo
  2. Plivo

Use Case:

Our use case is to send a client their next scheduled appointment to speak with one of our investment advisors if they send a text message of “schedule”.  For this prototype, the date would have been previously set.

Solution Architecture



User-appt-schedule is the API gateway that fronts the lambda function named user-schedule-get.  The lambda function then fetches the data from the dynamo DB for the schedule date.


  1. Create your Dynamo DB with the primary key of phone number and column for appointment date.pic2
  2. Create your node JS lambda function with that gets the date from the DB given the phone number.

console.log('Loading function');
var aws = require('aws-sdk');
var ddb = new aws.DynamoDB(
{region: &amp;amp;amp;quot;us-west-2&amp;amp;amp;quot;,
params: {TableName: &amp;amp;amp;quot;participant-schedule-reminder&amp;amp;amp;quot;}});

var theContext;

function dynamoCallback(err, response) {
if (err) {
console.log('error' + err, err.stack); // an error occurred

else {
console.log('result: ' + JSON.stringify(response)) // successful response
console.log('parsed response ' +JSON.stringify(response.Item.scheduledDate.S).replace(&amp;amp;amp;quot;\&amp;amp;amp;quot;&amp;amp;amp;quot;, &amp;amp;amp;quot;&amp;amp;amp;quot;));
theContext.succeed(&amp;amp;amp;quot;Your next retirement checkup is scheduled on &amp;amp;amp;quot; +
JSON.stringify(response.Item.scheduledDate.S).replace(&amp;amp;amp;quot;\&amp;amp;amp;quot;&amp;amp;amp;quot;, &amp;amp;amp;quot;&amp;amp;amp;quot;));

exports.handler = function(event, context, callback) {
theContext = context;
console.log(&amp;amp;amp;quot;Request received:\n&amp;amp;amp;quot;, JSON.stringify(event));
console.log(&amp;amp;amp;quot;Context received:\n&amp;amp;amp;quot;, JSON.stringify(context));

//Determine text
var textBody = event.body;

//Get the phone number, only the last 10 characters
var phoneNumber = event.fromNumber.substring(2, 12);
console.log('phone ' + phoneNumber);

if (event.body.trim() == &amp;amp;amp;quot;schedule&amp;amp;amp;quot;){

var params = {
&amp;amp;amp;quot;phone-number&amp;amp;amp;quot;: { N: phoneNumber }
AttributesToGet: [
//var response = ddb.scan(params,dynamoCallback);
var response = ddb.getItem(params,dynamoCallback);
} else {
theContext.succeed(&amp;amp;amp;quot;Text 'schedule' to get your retirement check up date&amp;amp;amp;quot;);

3.  Create your API gateway that invokes the lambda function for each get request.  The nuance with this is that every request that comes from Twilio is in TwiML format (XML format) but our lambda function requires a JSON object so a conversion has to be done in the integration piece of the API gateway and every JSON response from the lambda function needs to be converted to TwiML format.




  1. You can sign up for a trial account with Twilio and they you will get a phone number from them.
  2. You can then program this phone number with web hooks.  The URL below is the ARN URL from AWS for the API gateway.



Slack is an instant messaging and collaboration system.  They have teams and channels and have robust features for search.  There are commands that can be programmed called “slash commands” which allows for forwarding the message typed in the chat box to an external source with the use of a web hook.


Who uses Slack?

  1. Airbnb
  2. CNN
  3. Buzzfeed
  4. EA Sports
  5. Ebay
  6. Harvard University
  7. Samsung
  8. Expedia
  9. Intuit

Main Competitors:

  1. HipChat
  2. Yammer
  3. Google Hangouts
  4. Facebook at work

Use Case:

Our use case is to send a client their next scheduled appointment to speak with one of our investment advisors if they send a slash command in the appropriate FE channel.  The date would have been previously set.

Solution Architecture:



  1. Create a slash command (“fngn” in this example).  You still wouldn’t have the URL at this point since that will come from AWS.  The token below will be copied over to AWS’ lambda function to verify that the request came from the right channel.



  1. Create a Dynamo DB with the slack handle as the primary key:
  2. Create a lambda function that takes the handle and looks it up in the DB.
  3. Create an API gateway to convert the XML format from slack to JSON and vice versa.


Intel Edison + Grove IoT device

Intel Edison is a chip where code can be pushed built for wearables and Internet of Things (IoT) devices.  The grove toolkit contains a variety of widgets that can be attached to the Edison board.


Use case:

When motion is detected from the IoT device, send me an email notification.

Solution Architecture:

Grove has a motion detector widget so that was attached to the Edison board.

  1. Intel XDK IoT Edition is the IDE that provides templates to create Nodejs projects and deploy the code for sensors to Edison.
  2. From there, it’s just a matter of setting up the SNS service so that every motion detected triggers a publish to the email.


This was a very fun hack.  Not only did I find out more information about Twilio, Slack and IoT but it made me realize how easy it is to prototype and possibly productize solutions using the power of AWS.  We are now only limited by the power of our imagination.

4 comments on “How we improved our EV charging station sharing with HipChat, AWS and ChargePoint API”

How we improved our EV charging station sharing with HipChat, AWS and ChargePoint API

Have you ever noticed that there are never enough EV charging stations at work?

You probably have if you own an electric car or a hybrid. Electric vehicles and plug-in hybrids have come a long way and are steadily growing in popularity. A combination of government incentives and exemptions from carpool lane rules make them a great choice for commuters. As a result, it is a common employee perk in Silicon Valley, for a company to offer free or subsidized charging at work. However, it seems the number of available charging stations always seems to be dwarfed by the growing number of electric car drivers. Even if there are several chargers, the process around sharing the chargers between employees is never perfect and someone inevitably gets stuck without enough juice to drive home.

At Financial Engines Sunnyvale, CA headquarters, we have four charging stations managed by ChargePoint Network. At the time of this writing, there are roughly 35 EV drivers trying to get a charge on a daily basis. At first, our “sharing” process didn’t quite work. First, there was no visibility into when people plugged in and out. Our offices in Sunnyvale are far enough from the charging stations that you cannot see if there is one available at any given point. Second, not everyone bought into the whole “sharing” idea, and sometimes cars would continue to occupy the precious charging spots long after their car was fully  charged.

EV Concierge to the rescue

By experimenting with various options, and with a little bit of coding and iterating, we developed a nifty system that is fun to use and works much better – so much so that on most days, by late afternoon we have chargers available and nobody is questioning that “sharing” concept anymore. We call it – EV Concierge.

EV Concierge is an automated assistant that monitors our charging session availability, manages orderly queue of drivers who need a charge and occasionally shames you into moving your car in time for others to get a charge.

How it works

EV Concierge2

At the center of the system, we have HipChat, which is already used by nearly everyone in the company for daily collaboration. We set up a dedicated room, called “EV Charging” and that is where all the magic takes place.

It all starts with an employee typing an “ev add” command to put themselves in the queue. He (let’s call him Dave) does it right there in the HipChat window, the same way he would post a short message. In response, the system will put Dave at the bottom of the queue and display a complete listing of the queue in the same HipChat window. Now Dave knows where he is on the list.

In the background, we have a monitoring process running (EV Concierge), which is communicating with ChargePoint network via XMPP protocol and listens as people plug-in and plug-out. In response to these events, EV Concierge notifies everyone in the room about a change in charger availability and also informs the next person in the queue that it is his or her turn. That is how Dave knows that it is his turn. EV Concierge will automatically remove Dave from the queue as soon as it detects that Dave has actually plugged in.

Growing feature list

Adding new features to EV Concierge and using it daily has been equally  fun. There is something to be said about being able to interact with your customers and iterate on the solution on a daily basis. Iterative development does not get better than that! Since basic functionality was put in place, we enhanced EV Concierge with a number of useful features. Here is a complete list:

  1. ev add/remove/list – driver queue management
  2. ev suspend/resume – keeps your place in the queue but lets person behind you go ahead of you. This is useful when you are stuck in a meeting.
  3. ev next – calls out next person in the queue. This happens automatically when charger becomes available.
  4. Reminders:
    • When a charger is available for more than 10 minutes, EV Concierge will notify the room about precious time being wasted.
    • When a user has been charging for more than 2.5 hours, EV Concierge will (politely) ask him or her to move their car to let others partake in the experience.

In fact, EV Concierge has been getting so smart, people sometimes talk back to it, not realizing it is a robot. 🙂

How we built it

If you are intrigued by this creative solution to a common workplace problem, you can build your own EV Concierge relatively easily. We are working to open source our implementation, but in the meantime, here is how to do it.

ChargePoint API

Charge Point publishes their Web Services API and it can be found here https://na.chargepoint.com/UI/downloads/en/ChargePoint_Web_Services_API_Guide_Ver4.1_Rev4.pdf

If your company owns the chargers, most likely you have a support agreement with ChargePoint and your facilities manager has the API key that you will need to integrate with the API. There are two ways to talk to ChargePoint: SOAP API and XMPP protocol. We use a combination of both because they expose different levels of information about drivers and their sessions. In fact, the level of detail you get in the API is highly dependent on your corporate support agreement with ChargePoint and will control how sophisticated of a system you can build. If you are a Java developer, Smack API is an easy to use library for XMPP listener.

HipChat Integration

There are many ways to integrate with HipChat. For on-premise installations, you can extend HipChat with HUBOT. It is a tiny process that can listen for special words and execute commands. It uses CoffeeScript and is relatively easy to use. For cloud versions, you can use a feature called “slash” which allows you to map anything that starts with a “slash” (duh!) to a REST API call. Finally, to programmatically post messages to HipChat from EV Concierge, you can use the simple REST interface that HipChat supports out of the box. For extra fun, you can extend HipChat with custom icons and make your EV Concierge messages include some branding (or character).

If your company uses Slack, Yammer, or Google Hangouts instead of HipChat – no worries. The same type of integration can be achieved with any of these systems. The key is to try not to introduce yet another communications interface in your workplace. If you want high adoption, stick with existing tools.

Queue Persistence

This can be done in a thousand different ways, but if you have access to the AWS ecosystem, it is an obvious choice. You can use a document storage DB like AWS Dynamo, put the persistence logic in an AWS Lambda function, slap a REST interface on top of it through AWS API Gateway and be done in a day or less. This was a hackathon project, so I know for a fact that it can be done in less than a day, even if you have never heard of Lambda or Dynamo before.

EV Concierge

Finally, the heart of the logic is implemented as a standalone Java program. It loads the initial charger status via ChargePoint SOAP API on start up and then goes into listening mode for events. With every event, it keeps track of who is plugged in, who is finished and who should be notified. To make HipChat messaging personal, you will need to map ChargePoint user ids that you get through the API to your internal HipChat user ids. This will allow you to automatically manage the queue when a driver is recognized and also send direct messages using HipChat “@” feature.

EV Concierge is deployed on Amazon cloud via Elastic Beanstalk, which simplifies deployment and management of the process. One caveat to note is that we noticed that XMPP connection gets stale after prolonged use (12+ hours), so we had to build an automated restart at 6AM every morning to renew it. It works fine for us because EV Concierge can sleep at night ahead of a busy working day.


EV Concierge has been a fun little project to work on. It’s great to have a problem that is so well defined and actionable that it begs to be solved. It’s also a great feeling to build something that your fellow colleagues can use and enjoy on a daily basis. As I mentioned, since we adopted EV Concierge, many more drivers get a chance to charge and the process is much smoother and more fun for everyone. The list of proposed features is also steadily growing. Oh, if one can only find enough time…

As part of the hackathon, we put together a little presentation that describes EV Concierge in 3 minutes and shows it in action. Here is the video for your to enjoy.