Developing a Java plugin that automatically fixes code

Written by Rick OssendrijverDec 5, 2024 09:5614 min read

In one of our previous blogs we wrote an article that explains the tools Error Prone and Refaster, and how we use them within Picnic. Now we will dive into how we built many extensions and rules for Error Prone. Building and extending a Java plugin that integrates directly with the compiler comes with some difficulties, and additionally, we’ll discuss some challenges that come with developing and maintaining an open source plugin within the Java ecosystem.

For years we’ve been using Error Prone, a static analysis tool to automatically catch and fix bugs in our Java codebases. What started with simple tweaks to an existing project evolved into our own open source project with various plugins. These plugins run custom checks during the build process, improving code quality by identifying bugs, enhancing style, performing migrations, and enforcing coding standards.

Error Prone’s deep integration with the developer workflow is key — it flags issues during the build process as compiler diagnostics. This means any issue it identifies must be worth emitting compiler errors and warnings for, as it directly impacts the developer’s experience. Ensuring checks are both precise and correct is crucial to maintaining an efficient workflow. If utilized correctly, Error Prone’s automation can significantly improve code quality and save developer hours.

How Did We Get Here? Developing the Knowledge

Our journey with Error Prone began in 2015, three years after its initial release, when we added it to our build configuration. From the start, we weren’t just users of Error Prone — we actively engaged with its community. Whenever we encountered bugs or edge cases, we created GitHub issues and submitted pull requests. This hands-on approach allowed us to incrementally gather knowledge about the tool.

The turning point in our journey came during a routine code review. We noticed that the same issues were manually flagged repeatedly. This realisation sparked an idea: why not automate these checks? Driven by this insight, we began experimenting with defining custom bug checks within Error Prone. Our initial efforts led to the creation of our first two checks, EmptyMethod and PrimitiveComparison: the first to learn the basics of writing a bug check, and the second to address a common issue in our codebase.

As we explored Error Prone further, we discovered Refaster — a powerful tool that made defining new refactoring rules even easier. We introduced Refaster rules in 2017, but quickly faced challenges in scaling the process of managing and applying them, as each rule file requires an explicit assembly and application step, with application requiring that all code is recompiled. To overcome this, we developed two Refaster modules that streamlined the process, allowing us to assemble and apply rules across entire codebases with ease.

This innovation opened up new possibilities. We began using our repository Error Prone Support (EPS) for larger migrations, such as converting TestNG assertions to AssertJ and migrating from RxJava 2 to Reactor. As we expanded our modules and refined our ruleset over the years, these tools became essential for upholding consistency and quality throughout our internal code base. Eventually we made our Error Prone Support project public [1], to show our commitment to code quality and give back to the developer community.

Ensuring Compatibility with JDKs

One of the key challenges we face regarding the Java ecosystem is ensuring compatibility with various JDK versions. At Picnic, we typically transition to new Long-Term Support (LTS) versions within a month or two of their release. However, when managing an open-source project, we need to consider a broader audience. This means keeping the minimum JDK version “as low as possible” to accommodate different users.

Striking the right balance between maintaining backward compatibility and embracing newer JDK features is crucial. While it’s tempting to raise the baseline JDK version to leverage new language features, doing so could exclude users who haven’t upgraded yet. At the same time, maintaining a low baseline can add additional complexity and maintenance overhead.

To manage this, we take several precautions to ensure our code runs smoothly across different JDKs. Our Continuous Integration pipeline plays an important role in this process by building and testing the Error Prone Support project on multiple JDK versions across different platforms. Currently, we build on JDK 17, 21, and the latest non-LTS release (22 at the time of writing) [2]. While most tests are executed on Ubuntu, we ensure broader compatibility by running at least one full build with tests on Windows and macOS.

An interesting example of this challenge came up in Moderne’s OpenRewrite project. They had built an integration between OpenRewrite and Refaster, allowing our Refaster rules to run on their platform. As their tooling is required to be Java 8-compatible, we ran into some incompatibility issues. After discussing these challenges with their team, we decided to customise our build such that it produced an OpenRewrite-compatible artifact targeting Java 8, while the remainder of the project produced JARs requiring JDK 17.

Context-aware flagging

After ensuring that our code is compatible across different JDK versions, we encountered the next challenge: context-aware flagging. Each project is distinct, varying in size, goals, dependencies, and coding standards. For example, you might be working on a large legacy codebase or a modern, lightweight one. The difficulty in developing an open-source tool, used by companies or projects you can’t directly observe, lies in creating something that works universally. A best practice in one project might be an antipattern in another. As a result, our bug checks must be adaptable to any codebase without triggering unnecessary errors. If a check produces too many false positives, it often gets disabled or silenced with numerous @SuppressWarnings annotations.

Additionally, not every check is applicable to all codebases. For instance, since at Picnic we don’t use JDK 8, we disable the Jdk8ApiChecker. Another example is the NullableOptional check [5], which flags instances of Optional#ofNullable, because we have settled on representing absent values as an empty Optional. While this aligns with our best practices, it doesn’t work well with jOOQ-deserialized entity types. This demonstrates the need to conditionally enable checks based on the specific context of your project.

Lombok, an annotation processor that also modifies the bytecode of annotated types, presents another challenge. Suggested fixes emitted by Error Prone are generally defined in terms of AST (abstract syntax tree) replacements that are then translated into the corresponding source code modifications. But in many cases Lombok’s bytecode manipulation yields AST nodes for which no or iGncorrect source code information is available, yielding incorrect suggestions or even causing exceptions.

While most of our checks are broad enough to apply to various codebases, some don’t fit every scenario. That’s why the ability to configure or disable them is crucial. Users can adjust the severity of a check — setting it to INFO, WARNING, or ERROR— or disable it entirely if it doesn’t fit their needs.

Suggesting Fixes

Once we’ve established that a check is contextually appropriate, the next step is to determine how to implement the suggested fixes. According to Google’s guidelines, the issue should be easy to understand and the fix should be easy to apply. As a result, most of the over 500 checks provided by Error Prone are clear, are broadly applicable and have straightforward suggested fixes. Developers usually agree on what the fix should be, and it can be automatically applied and merged quickly. The key to a successful suggested fix lies in maintaining a low false positive rate and ensuring that the resulting code remains compilable.

However, challenges arise when the preferred fix isn’t as clear-cut, as it can depend heavily on the context and coding style of the project. This complexity increases when dealing with custom checks that focus not just on fixing bugs, but also on enhancing code style and consistency. To make these style-related checks more usable, they often accept additional arguments to tweak their behaviour, and by extension, the suggested fix. Some checks even offer conservative or aggressive modes, giving users flexibility with a simple boolean toggle.

Consider the UnusedMethod check. The suggested fix is to remove the entire method. However, some methods, though empty, are annotated with @Override or other annotations, which indicate they shouldn’t be deleted. To handle this, the check accepts a list of annotations that mark methods that should be exempted from deletion, allowing developers to add their own internally-used annotations.

Given our extensive use of Guava, we’ve also created suggested fixes that introduce new Guava usages. However, when introducing Error Prone Support on the open-source Checkstyle repository [3], we quickly realised that this wasn’t desirable for everyone. This led us to file an issue in our own project [4] to make this behaviour configurable, highlighting the importance of considering different preferences and project needs.

In summary, defining suggested fixes requires a deep understanding of the context, style, and preferences unique to each project. The goal is to create fixes that are applicable in 95% or more of cases, even if it means making the bug check more complex.

Validating the Correctness of Changes

The final and most challenging step in developing custom checks is validating the changes before they are released and rolled out. This is a critical phase, because if we don’t thoroughly validate the new checks, we risk encountering issues either during internal rollout or, worse, after external users have already adopted the changes. The later a problem is identified, the more time-consuming and costly it becomes to address it. This might involve creating a fix, revalidating it, and performing a new release — an outcome we strive to avoid.

During the code review process of checks, our focus goes beyond the code quality of the check. We continuously consider the broader picture, beyond the Picnic context. We focus on extensive test coverage, generally aiming to cover every possible (edge) case, including all variants of suggested fixes. As a result, reviewing pull requests can take a significant amount of time.

To improve our standard testing procedures, we use Pitest [6], a mutation testing framework that helps us identify uncovered scenarios. Mutation testing introduces changes (mutations) to the code and ensures that our tests catch these changes (by causing at least one test failure), providing an extra layer of confidence in our validation process. Additionally, experience plays a significant role. Years of writing code and working on custom checks have provided us with an intuition for potential edge cases and a solid foundation for creating reliable checks.

Before any release, we test the release candidate by applying it to some of Picnic’s largest internal repositories. This step often reveals additional insights into the quality or shortcomings of the checks. However, because Picnic maintains a high coding standard and a uniform tech stack, it can be challenging to identify edge cases that might arise in more diverse environments. This internal uniformity, while beneficial in many ways, limits our ability to foresee how these checks will perform outside of Picnic. To bridge this gap, we developed an integration testing framework that applies all our checks on multiple open source repositories. By doing so, we can gain more confidence in the checks and find even more edge cases.

Despite these efforts, it’s important to realise that covering every possible scenario is simply not feasible. Therefore, it is important to have a straightforward and accessible feedback mechanism. For internal use, we rely on a dedicated Slack channel. For external users, we’ve set up GitHub issues and discussions to ensure that the feedback process is as smooth and easy as possible.

This article is written together with Stephan Schroevers.

Originally this article was published in the NLJUG Magazine.

Links

[1]: https://github.com/PicnicSupermarket/error-prone-support

[2]: https://github.com/PicnicSupermarket/error-prone-support/blob/master/.github/workflows/build.yml

[3]: https://github.com/checkstyle/checkstyle/pull/14229

[4]: https://github.com/PicnicSupermarket/error-prone-support/issues/954

[5]: https://errorprone.info/bugpattern/NullableOptional

[6]: https://pitest.org

Developing a Java plugin that automatically fixes code was originally published in Picnic Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.