Define, Extract and Transform inside Picnic’s Page Platform

Written by Phil WerliDec 20, 2024 09:2212 min read

This blog post is part two of a seven-part blog series. We recommend reading the first part where Lars Lockefeer shares the motivation and introduction to Picnic’s Page Platform.

In our first blog post, we introduced Picnic’s journey to optimize our ability to ship changes to our customer-facing apps quickly through Picnic’s Page Platform, which implements the Define, Extract, Transform and Present (DETP) framework. In this blog post, we’ll take a closer look into the processes performed server-side in the backend: Define, Extract and Transform.

Page Platform’s generic definition, extraction and transformation

What if there are no limitations in the Picnic app anymore? What if the app can look different every day? What if every screen is a blank canvas for business teams to independently choose how and what to display on it? What if it’s simply a bunch of building blocks, just like… Lego?

Our answer to this was the Define, Extract, Transform and Present (DETP) framework.

Simplifying data definition

Imagine you are one of our smart analysts at Picnic with a great new idea. You are proficient in SQL or took a coding course at our in-house Tech Academy. Congratulations, you are fully qualified to define data to use within the Page Platform!

In the past, defining data at Picnic followed a common pattern. Developers build a Java service with persistence and a CRUD API and operators provide data for the services. Products, categories, recipes — all fundamental entities have such an underlying service. For some entities this makes sense, but what about less fundamental or personalized data points?

In our constant effort to build generic platforms, we built the Attribute Data Store (ADS) — even before the Page Platform. ADS allows a standardized, type-safe definition and storing of attributes, essentially a key-value store on steroids. For analysts, attributes can represent practically anything you want, for example how much a customer is expected to love a recipe.

As an analyst, you define the attribute in a UI and can populate it from different producers: For example, as a one-time job querying our extensive data warehouse and in (almost) real-time after each order through Picnic’s event-based rule engine, just to name one common option.

Each customer-related attribute is automatically synchronized and made available to be used for extraction in the Page Platform, fully automagically!

Takeaway: ADS allows type-safe definition and population of data to extract inside Page Platform.

Streamlining data extraction

To support Picnic’s ambitions in multiple countries, we need many people to feel comfortable configuring pages for our customer-facing mobile apps. Specifically for data extraction, we required a domain-specific language (DSL) that onboard newcomers quickly and simultaneously have experts feel like there are no limits.

The only logical choice for us was simple: SQL was already extensively used by our analysts, and ID-based queries allowed onboarding newcomers.

Using SQL was great for configuration, but represented a unique challenge to our Java-based Page Platform: How do we resolve those SQL queries?

Mid-term solution: Custom query parsing

We identified our initial use-cases to be quite similar in shape: Retrieve data from an existing service by an identifier, implicitly for the requesting user. We opted for a custom ANTLR-based query parser to allow ID-based queries in the form of:

SELECT * FROM products WHERE ID IN (‘apple’,’banana’)

Such products “table” manually registered a callback which provided a set of IDs and the requesting user, which was all the invoked Product Service required.

Not having feature-complete SQL came with obvious limitations, but the most important one turned out to be missing support for SQL joins. Being too complicated to implement generically using our ANTLR-based setup, this resulted in all the logic of the data extraction process still being performed in Java. And this does not scale well, as each use-case required its own dedicated “table”.

While far away from our North Star, ID-based data extraction allowed us to query our assortment which enabled many initial use-cases. During prototyping the Page Platform, this was a huge step. But we were already trialing a much much better solution. So, welcome to…

Our North Star: Apache Calcite

Calcite is an open-source Java framework which allows building a SQL “database” without persistence. Instead, Calcite tables and their schemas can be programmatically registered in Java code, and Calcite takes care of query planning, optimization and execution by auto-generating and executing Java code, depending on the query.

Calcite has been a difficult dragon of a framework to slay and while this tale has to be fully told in future blog posts: We managed successfully and the yields have been nothing short of spectacular. It completely changed the way we think about new customer-facing functionalities.

While some Calcite tables are simple Java classes delegating a callback to generic underlying services, most tables are actually backed by ADS attributes. Each attribute is automatically available in Calcite as a table with a type-safe schema, and query writers can make use of feature-complete SQL to group, order and most importantly join with other Calcite tables. This makes features which previously cost a development team weeks of work “just a Calcite query”.

Remember our “Favorites” page example from above? Adding a customer’s most loved recipes at the top of the page would previously have required a dedicated service built and maintained by developers. With ADS and Calcite, it’s nothing more than defining the attribute and this Calcite query:

SELECT *
FROM recipes
WHERE id IN (SELECT recipe_id
             FROM customer_loving_recipe_probability
             WHERE probability > 0.5
             ORDER BY probability
             LIMIT 10)

With our analysts onboarding to Calcite, over time more and more complicated use-cases were solved by them using SQL in the Page Platform. Business logic is now in the hands of the respective business owners, sparking a new level of innovation and experimentation.

Takeaway: SQL queries using Apache Calcite allow data extraction for arbitrarily complex use-cases inside Page Platform.

Allow easy data transformation

We earlier mentioned the unbeaten simplicity of having building blocks that fit seamlessly with each other. To have a consistent representation of the Picnic apps, our talented team of UX designers pre-defined a set of visual components which can be composed arbitrarily and contained placeholders for actual content. Each component was configured as a template and could be re-used in configuration of full “pages”, where all the defined and extracted data comes together.

Content can be either statically configured (e.g. the ID of the image to show) or dynamically by Calcite queries. For instance, a component being able to show products can resolve its content through arbitrarily complex queries, as long as said queries return products.

Starting with the home page, component-based pages configured by operators one by one replaced almost all screens within the Picnic app, allowing for constantly changing content without input from developers.

I was closely involved in the migration of existing pages, but my personal highlight was certainly the rollout of a full-fledged recipe category tree, without the development team even knowing about it. The Page Platform was simply sufficiently self-service.

Technology-wise, our initial prototyping resulted in Handlebars.java as a choice for the first production use-cases, as it allowed definition of custom Java-based “helper” methods and controlling the flow of execution through built-in conditional helpers. Two examples of custom helpers were isBefore to allow scheduling of content or template for rendering nested templates.

With built-in and custom helpers, a (heavily simplified) King’s Day themed component showing product tiles may be defined like this:

{{#if (isBefore 'April 27th 10:00')}}
  {{#with "SELECT * FROM products WHERE color = 'orange' LIMIT 20" as | products |}}
    Prepare for King's Day!
    {{#each products}}
      {{#template 'product-tile' id=this campaign='kingsday_preparation'}}{{/template}}
    {{/each}}
  {{/with}}
{{else}}
  {{#with "SELECT * FROM products WHERE nutri_score = 'A' LIMIT 20" as | products |}}
    Recover from King's Day!
    {{#each products}}
      {{#template 'product-tile' id=this campaign='kingsday_recovery'}}{{/template}}
    {{/each}}
  {{/with}}
{{/if}}

While technically working, we identified many limitations of and subsequently reasons to migrate away from Handlebars: We regularly needed to extend the set of Java-based helpers for new use-cases, resulting in some logic in Java and some logic in Handlebars templates.

And after starting to render pages with hundreds of nested templates, rendering performance turned out so bad that even applying a handful of patches to an internal fork could not fix it anymore.

Most importantly however, the developer experience was bad, very bad. Picnic’s code base is generally uniform, well tested and offers lots of static code analysis, such as Error Prone Support. With Handlebars, we had none of that. Missing best practices, and limited support through IDEs made developing Handlebars templates very challenging and we had many short-lived outages as a result. Rendering Handlebars templates basically meant stitching together strings and literally a missing comma could break a page at runtime.

Looking back, we made the mistake of shipping our Peak of Complexity solution and built the first production use-cases on top of it. However, we incorporated all learnings into an exciting frontend framework which replaced our usage of Handlebars. This replacement and the associated process will be explained in more detail later in this blog series, so stay tuned.

Takeaway: Configuring pages with reusable components following UX design guidelines as self-service unlocks operator creativity.

Conclusion

This blog post introduced three of the four cornerstones of Picnic’s Page Platform. We discussed how an easy to configure generic key-value storage integrated into SQL-based data extraction using Apache Calcite combined with a programmable transformation environment unlocked business value previously unimaginable.

These server-side rendering capabilities would however be nothing without Picnic’s mobile apps being able to display the rendered content and customers interacting with them. In one of the next blog posts, our mobile engineers dive into the fascinating challenges of changing Picnic’s mobile apps from primarily React Native screens to rendering generic pages. This remaining cornerstone concludes the foundation of the Page Platform.

This article is part of a series of blogposts on the architecture behind our grocery shopping app. Hungry for more? Chew off another bite with any of our other articles:

In Faster Features, Happier Customers: Introducing The Platform That Transformed Our Grocery App, we share the motivation behind our architecture and give a general introduction.
In Enabling rapid business use case iteration with Apache Calcite, we go into great detail of why we chose Apache Calcite as the tech behind our query engine, and how we are using the technology.
Picnic’s Page Platform from a Mobile perspective: enabling fast updates through server-driven UI details our approach to implementing server-driven UI in our mobile apps.
In Tackling Configuration: creating Lego-Like Flexibility for non developers, we discuss the configuration tools we created on top of this architecture, allowing non-engineers to change the configuration of our Store.

Stay tuned in the coming weeks while we publish more blogposts in this series.

Define, Extract and Transform inside Picnic’s Page Platform was originally published in Picnic Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.