login


Google Sign inLinkedIn Sign in

Don't have an account? 

Implementing a major platform technology change

Dev Processes
Sharing the vision
Deadlines
Managing Expectations

20 April, 2018

Chris Radcliffe, the VP of Engineering at ProQuest, talks about how he successfully implemented a major platform technology change while managing stakeholders’ and customers’ expectations.

Problem

Our platform's full-text search was showing its age and its limits. Our fundamental business model of providing subscription access to e-books via a handful of large collections was being challenged, as customers now wanted to buy individual titles, and wanted to see the changes instantly. However, the existing mechanics of our platform meant that indexing individual titles took hours or even days, as the entire index of all documents had to be regenerated. We needed to improve our search subsystems so that they would allow near-instantaneous addition and search of new documents. And of course, we needed to do this without breaking the system, taking it down, or disenfranchising any customers or end-users. This was a bet-the-company change.

Actions taken

Our objectives were fairly clear, but detailed plans about how to actually achieve our goal were not. It wasn't as easy as just building a completely separate stack, as the content and historical user data had to be migrated in almost real-time, and many of the technologies we have today weren't available then. I started out by sponsoring a sequence of internal experiments on every aspect of what was going to change for customers and users with our Chief Architect, and formed a dedicated Core team to pursue them. This was the "managing down" part of the process. However, I also did some "managing up". It became very clear to me that we needed to broadcast beyond our normal mode of "feature release" communication to the company about the big changes that would be occurring, and we needed to provide upper management with choices about tradeoffs in terms of budget and resources. And, we felt we needed to communicate the technical risks we were facing and tell the story of how we would manage it.
Once we said exactly what we were going to do technologically, we also had to decide on how we were going to implement the changes. We couldn't just build another stack, and permanently switch both customers and users to it, since so much historical data needed to be migrated and transformed to support the features of the new platform. Instead, we engaged in what I refer to as "wing-walking". In the 1920s, there were lots of experiments and stunts with airplanes, as aviation was still a new technology. One popular mode was to walk across and through the airplane's wings while it was flying, sometimes to demonstrate the stability of the aircraft, sometimes just a daredevil stunt. The first rule of wing-walking is, "Don't let go of what you have a firm grip on until you have a firm grip on the next thing". For our project, we did things in an extremely methodical "wing walking" way, moving customers, technology and components, and we always gave them a way to go back. We had determined that we could run two complete systems in parallel. That meant building, maintaining and feeding all content into both of the platforms in parallel. However, while this allowed incremental migration of a customer's holdings to the new platform, it really constituted throwing a one-way switch in code with respect to recording new purchases and user interaction with their new bookshelves. Due to this, I fostered discussion and garnered acceptance of a set of mechanisms to migrate customers and users to back to the old system, even with the new data, in the event of a truly catastrophic fail. I personally did the research and experiments (SQL database work) on syncing user purchases and user bookshelf migration ( bi-directional). Finally, before releasing the new system, we practiced, practiced, practiced procedures, failback, monitoring and tested, tested, tested performance, accuracy, content ingest volume, and the new document search capability (instantaneous availability). All this I continually communicated to other leaders in the company, presented the story visually at all-hands meetings, and invited all to help test. Then we took action. There was no major rollback, and while there were a few problems with admin and prep we corrected these with each phase before moving to the next. After the initial experimental wave we "migrated" customers first by only sending their purchases and searches to the new subsystem, to affirm search stack resilience. Then we followed with business logic and user bookshelf migration. Within days we had a verified success. Most of the failbacks were never used, but we were glad to have them at hand.

Lessons learned

We were scared to death that we were going to risk the business, but inaction would have resulted in the same thing. By insisting on taking a very deliberate approach, rather than just rushing it through and "hopefully taking a couple of weeks" as the company initially believed it would take, we were able to successfully introduce our new technology. When you, as an engineering manager, are being asked to make a thing happen "for the company", you have to do a lot of legwork yourself to ensure that stakeholders really understand the consequences of what they're asking for. Often, they won't really want to know the details, but presenting a plan in terms of risk will immediately get their attention. However, it is not sufficient to discuss risk without a ready plan to address it. Then, what you are seeking is approval to pursue a plan your stakeholders understand. You can then move to complete the process with budget, manpower and timing adjustments to any previously set expectations. Engage with your stakeholders, early and often, and don't forget the first rule of wing-walking!


Related stories

Some Ideas for Breaking Down Silos In Your Organization
30 June

Jeff Foster, Head of Product Engineering, shares how he managed to break down silos in his organization by encouraging their employees to choose their own team.

Team reaction
Managing Expectations
Company Culture
Internal Communication
Collaboration
Productivity
Reorganization
Jeff Foster

Jeff Foster

Head of Product Engineering at Redgate

Some Useful Tips for Decoupling Releases and Deployments
30 June

Pierre Bergamin, VP of Engineering at Assignar, outlines some useful tips for decoupling releases from deployment and increasing deployments by a huge factor, speeding up reverts and planning releases in a better way.

Agile / Scrum
Dev Processes
Pierre Bergamin

Pierre Bergamin

VP of Engineering at Assignar

How to Identify Root Cause of an Application Failure
30 June

Murali Bala, Director, Software Engineering at Capital One, outlines how he applied a root cause analysis to fix a recurring outage of their website.

Dev Processes
Leadership
Productivity
Murali Bala

Murali Bala

Director, Software Engineering at Capital One

How to Inspire Mission Among Engineers
27 June

Agata Grzybek, ex-Uber Engineering Manager, outlines her efforts to inspire mission-driven culture among engineers on her team.

Company Culture
Impact
Sharing the vision
Agata Grzybek

Agata Grzybek

Engineering Manager at ex-Uber

Challenges of Migrating Old Legacy Software
27 June

Tim Olshansky, EVP of Engineering at Zenput, explains all the challenges of migrating legacy software to the cloud emphasizing the importance of identifying the riskiest things first and applying small, incremental changes.

Dev Processes
Productivity
Tim Olshansky

Tim Olshansky

EVP Product & Engineering at Zenput

You're a great engineer.
Become a great engineering leader.

Plato (platohq.com) is the world's biggest mentorship platform for engineering managers & product managers. We've curated a community of mentors who are the tech industry's best engineering & product leaders from companies like Facebook, Lyft, Slack, Airbnb, Gusto, and more.