login


Google Sign inLinkedIn Sign in

Don't have an account? 

Managing Customer Releases with Feature Flags instead of Branches

Dev Processes
Productivity

28 January, 2019

Eugene Marinelli, Blend founder and CTO, talks about how his experience working at his previous company informed Blend’s feature flag-based approach to customer configuration management. Instead of managing multiple customer instances that could be on slightly different branches, he pushed for managing changes using feature flags within a single version of code.

Problem

Blend provides a white-labeled consumer lending platform that streamlines the otherwise manual, paper-based, and generally painful borrowing process. One challenge inherent in our business model and industry is that different customers need different functionality and can accept change at different rates. Some customers want the latest functionality as soon as it's available, while others prefer to test every user-facing change in our beta environment for a month or more before allowing it to be promoted to production.

Actions taken

We considered two approaches to this problem. One approach was to deploy an instance of our core service for each customer, maintaining separate branches as needed to control which functionality was present. This would keep the code cleaner (on a given branch) and not require any new tooling or frameworks. On the other hand, it would make debugging more difficult since different customers would be on different versions with as much as a month of skew among them. It would also make it necessary to manage a linearly growing set of instances of the service, with a nontrivial setup time for each additional instance. It would require us to maintain a large number of branches in production, making continuous delivery basically impossible. The alternative was to deploy a single version of code for all customers, but control functionality differences using feature flags. Deploying a single version of code in production would keep debugging and code deployment simple since the team would only have to know about and understand a single recent version of code. It would also make it easy and instantaneous to revert changes that cause problems. The downside is that it would make the code more complex and branchy (each feature flag introduces at least one conditional), and would require new tools to manage flag state and scheduling. Finally, this approach would make it more difficult to fully customize anything at a given customer, which can be very useful in the short term, but is not as scalable in the long run. My co-founders and I experienced the "branch per customer" approach first-hand at our previous company. Because many of our customers were not willing to use cloud-based services at the time (~2008-2011), we typically hosted the application in customer data centers. This structure permitted us to deploy a different version to each customer instance, which made it more difficult to upgrade customers to the latest version — every live branch had to have the latest changes merged in, and not every customer was upgraded at the same time. We shipped a new version about once a month. As an engineer, I remember the pain of having to figure out how the code worked a month ago on the version that that customer happened to be on in order to debug. Because of this experience, my co-founders and I have always been adamant about hosting Blend in the cloud, and we had to overcome the objections of many early prospects to proceed with this. Among a multitude of other benefits, cloud hosting has made it unnecessary to host anything in customer data centers, allowing us to consider the feature flagging approach. This approach seemed like a better solution overall, so we went with it. Today we have almost 200 feature flags in production. We've scaled our ability to manage it using our "Configuration Center" UI, which allows flags to be controlled for cohorts of customers and automatically scheduled for promotion.

Lessons learned

The feature flagging approach has proven to be the right decision. It has scaled past 100 customers so far and allowed us to continue upgrading our core service relatively frequently (~daily) and in a highly automated fashion. Engineers only have to understand a small constant number of code versions at a given time. While a few bugs have been caused by unanticipated, untested interactions between flags, this has not been a major issue by-and-large. Still, it is simpler to deal with a smaller number of flag configuration sets — in other words, try to have as many customers as possible on the same settings. To this end, we have a small, constant number of customer cohorts that share the same settings. Three unanticipated classes of what we call "dead feature flags" have come about:

  • On-everywhere flags: Flags that are enabled everywhere, but linger in the code
  • Off-everywhere flags: Flags that have been in the code for months, but are still not enabled anywhere
  • Custom flags: Flags that are only ever enabled at one customer, or that are enabled at all but one customer The number of on-everywhere flags tends to grow because pods do not necessarily prioritize their removal immediately. Off-everywhere flags come about in several cases:
  • A feature is started but deprioritized.
  • A feature is worked on for a long time behind a single flag. This is not ideal because it means that the change is not being released in production iteratively.
  • A feature is finished, but no customer wants to enable it. The flag and the code it controls are kept out of optimism that customers will want it at some point. The number of custom flags grows because of the permanently unique needs of certain customers. In these cases, the flag needs to be converted to a permanent configuration setting, or we need to work with the customer to remove the need for customization. Everyone benefits from the deletion of unnecessary code, so we encourage pods to clean up after themselves in common codebases. We've been able to do this effectively using the Technical Health Pod.

Related stories

How to Effectively Communicate on Slack
6 July

Shridharan Muthu, VP of Engineering at Zoosk, discusses effective communication using Slack including a recommended framework that entails three simple tips to make the most of the tool.

Internal Communication
Remote
Productivity
Shridharan Muthu

Shridharan Muthu

VP of Engineering, Backend Applications at Zoosk

What We Learned From Running Open Spaces
30 June

Jeff Foster, Head of Product Engineering, highlights key learnings from his experience of running open spaces and if and how it contributed to an increase in innovation.

Company Culture
Productivity
Impact
Jeff Foster

Jeff Foster

Head of Product Engineering at Redgate

Some Ideas for Breaking Down Silos In Your Organization
30 June

Jeff Foster, Head of Product Engineering, shares how he managed to break down silos in his organization by encouraging their employees to choose their own team.

Team reaction
Managing Expectations
Company Culture
Internal Communication
Collaboration
Productivity
Reorganization
Jeff Foster

Jeff Foster

Head of Product Engineering at Redgate

Some Useful Tips for Decoupling Releases and Deployments
30 June

Pierre Bergamin, VP of Engineering at Assignar, outlines some useful tips for decoupling releases from deployment and increasing deployments by a huge factor, speeding up reverts and planning releases in a better way.

Agile / Scrum
Dev Processes
Pierre Bergamin

Pierre Bergamin

VP of Engineering at Assignar

How to Identify Root Cause of an Application Failure
30 June

Murali Bala, Director, Software Engineering at Capital One, outlines how he applied a root cause analysis to fix a recurring outage of their website.

Dev Processes
Leadership
Productivity
Murali Bala

Murali Bala

Director, Software Engineering at Capital One

You're a great engineer.
Become a great engineering leader.

Plato (platohq.com) is the world's biggest mentorship platform for engineering managers & product managers. We've curated a community of mentors who are the tech industry's best engineering & product leaders from companies like Facebook, Lyft, Slack, Airbnb, Gusto, and more.