At Gitar, we’re getting ready to release our first product that fully automates feature flag removal from code. This product automatically creates Pull Requests that clean up feature flags across major languages. We’re excited about this product because we’ve experienced how automated flag cleanup can impact developer productivity and code quality, and we’ve met with a number of companies who are currently dealing with this problem.

Feature flag tech debt

Feature flags are an important tool for driving growth. Modern software companies launch and experiment with features frequently. We’ve spoken to companies who introduce hundreds of flags every month into their code and already have thousands of flags in their code base. A number of vendors have emerged who provide feature flags as a service along with front-end and back-end SDKs that cover all the major languages

Feature flags introduce technical debt if they are not regularly maintained. Once a feature has launched or an experiment has concluded, their associated flags are no longer needed in code. These “stale flags” pollute your code with dead paths, which have negative consequences: code becomes harder to understand, test, and debug; quality and experimentation metrics become less reliable; and performance and reliability get impacted. See our previous blog – “Its not a feature, it’s a bug” – for more on feature flag debt.

Like most tech debt, stale flags are often deprioritized relative to feature development. Over time the debt piles up and acts as a drag on development. Eventually, it becomes prominent enough that leadership mandates specialized sprints, Fix-It weeks, company initiatives, and even Code Yellows to pay down the debt. We at Gitar experienced all of this first hand and are now witnessing it across the industry. We've seen companies with thousands of stale flags that have accumulated over time and need to be cleaned up, much of it in older code with no clear ownership.

Managing the debt

Feature flagging vendors have begun to introduce capabilities that make handling this tech debt easier. These capabilities manage the life cycle of feature flags, marking a flag as stale based on reasonable criteria such as how long its value has been stable in production, how recently it was evaluated, and so on. A user can then remove a stale flag from their code base and subsequently delete it from the feature flag database.

These capabilities help users in identifying stale flags, but don’t go the last important mile of automatically generating the PRs that cleanup flags from code. This last mile consumes expensive development time, requiring developers to find and replace all references to a feature flag and then clean up the resulting code after reasoning about control and data flow through the code. This manual step is tedious and error-prone toil.

Automated flag cleanup

Gitar’s first product completely automates this last mile and creates PRs that clean up stale flags from code. It's built on top of Polyglot Piranha, a powerful code rewrite engine that we’ve described in our blog “Automated Refactoring at Scale” and published at PLDI 2024.

To work reliably and with a great developer experience, it supports a number of important features:

Deep cleanups. It's not enough to just replace the feature flag API call with the literal value representing the final flag value. In all cases, this initial replacement uncovers downstream cleanups that cascade across the code base. These cleanups resemble compiler optimizations but applied at the source-to-source level: They include copy propagation, function inlining, constant propagation and folding, simplification, dead code elimination, unreachable code elimination (including test cases dependent on now non-existent functionality), and other cleanups. Here’s an example from our previous blog on Polyglot Piranha:

These optimizations must also understand language frameworks to clean up thoroughly; for example, they need to understand TSX and React to cleanup components, testing frameworks to clean up unit tests (e.g., deleting test cases covering removed functionality), and dependency injection to clean up Java or Kotlin code.

Feature flag integration. It’s not enough to provide just a manual command that triggers a clean up of a user-specified flag with an explicit value. A great developer experience requires deep integration with the feature flag tools to trigger cleanups automatically as soon as the system detects a flag has become stale. Ideally, the developer simply sees PRs generated magically as soon as a flag becomes stale. At a minimum, the feature flag tool should provide a button that triggers cleanup from the UI.
Custom API wrappers. It’s not enough to handle just the underlying feature flag system’s SDKs. Engineering teams often wrap underlying vendor SDKs with their own custom APIs and types. Vendor APIs typically take a string flag key and return a value, and require boilerplate to initialize the SDK or pass in additional context. Wrapper APIs improve on vendor APIs in a number of important ways:
- To hide boilerplate.
- To add a layer of type safety, especially to string-based flag keys or values.
- To follow language, framework, or team conventions related to error handling, concurrency, API composition, and other standards in the code base.
- To tailor the flag API to a team’s feature release processes.

In addition, some teams use annotations to generate wrapper APIs automatically or to use dependency injection frameworks. Supporting custom API wrappers requires an easy way to express these patterns via configuration so that flag cleanup can detect and cleanup these custom flag APIs.

Polyglot support. It’s not enough to handle just one language or one family of languages that share the same tooling. Modern software applications comprise many components written in different languages – some components even using multiple languages – all using feature flags:
- Mobile Apps written in Kotlin and Java for Android, Swift for iOS, or ReactNative across both platforms.
- Web apps written in a combination of Typescript, JavaScript, TSX and JSX using a framework such as React.js.
- Backend services written in Node.js, Java, Go, or Python.
- Low-level, performance-sensitive platform services written in Rust or Go.
- Data processing pipelines written in Python, Scala or Java.
- Desktop apps written in C++, C# or other .Net languages.

Flag clean up must work across all these languages and their frameworks, including components written in multiple languages (e.g., Kotlin and Java).

Developer toolchain integration. It's not enough to provide just a CLI or IDE plugin that generates cleaned up code. Modern development teams have custom development workflows that integrate tools for code hosting, code review, auto formatting, linting, CI, and so on, all customized via configuration. Automated code rewrites must integrate with these native toolchains and their custom configurations to create automated PRs that conform to standards and pass all quality gates.
Large codebases. It's not enough to demonstrate automated cleanups on playground examples. Production codebases are large with millions of LOC distributed across languages and repos. It's important to scale to large code bases, and for a magical experience, to run as fast as possible, reliably.

Automated flag cleanup both saves developer time and improves quality and security, providing a significant ROI. Development costs vary, but anecdotally many medium sized engineering organizations we’ve talked to create an average of ~2.5 PRs per engineer per week. Keeping up with the pace at which flags are introduced (~100 per month) requires approximately 10 engineers dedicated full time to cleaning up flags across many languages and frameworks!

Automated code maintenance

Gitar’s first product addresses a common problem faced by modern development teams. Our solution easily plugs into existing development workflows and supports a broad set of languages and frameworks. This not only frees up valuable developer time, but also improves code quality and maintainability. This automation empowers developers to focus on building and innovating rather than getting bogged down by technical debt.

We view stale flags as one piece of a bigger problem that we are tackling: Developers spend too much time maintaining code. Based on our experience, we know we can automate most of the toil related to code maintenance using tools that can analyze and transform code automatically.

Join us on this journey to revolutionize code maintenance and experience the transformative impact of automated feature flag cleanup.

Connect with us via slack to learn more.