Creating a Rule Engine for Real-Time Action Automation

Written by Frans BookholtApr 6, 2022 12:2137 min read

At Picnic, we strive to offer a personal, proactive, helpful, and friendly service. It’s why many of our customers choose to shop with us and it’s essential to our continued growth and long-term success. But our unique service proposition isn’t without its challenges. This is especially true given the size of our Customer Success and Marketing teams, which are relatively small when compared with the size of our customer base. Automation is the answer to this challenge, and this is where the Rule Engine comes in!

In this blog post, we will dive into the domain of business process automation, specifically customer-targeted interactions in our marketing activities, and to what extent traditional marketing CRM solutions are able to solve our problems. Finally, we will explain why Picnic decided to build the Rule Engine, our homebrew automation platform, in this already solution-crowded space 🙂

Automating unique experiences

Our business teams work hard every day to ensure the best possible customer experience. To achieve this, it is necessary to personalize a wide range of topics spanning every part of our proposition. Some of these personalizations include:

Ensuring we choose suitable substitutions in cases of out-of-stock products;
Ensuring we quickly notify customers in the case of delayed delivery, and asking for specific items for return if relevant to the customer’s previous order (e.g., bottles);
Learning from our customers’ shopping behaviour, for example by placing their favourite products at the top, or cross-listing cheese with “broodbeleg” if they searched for this previously.

Though many of our daily processes fall under the category of ‘general-purpose automation’, the majority of the actions resulting from our decision processes belong to the ‘customer-targeted action automation’ category. This includes marketing campaigns, transactional communication, and other types of “nudges” we can target customers with.

To handle such actions, we adopted a traditional Customer Relationship Manager (CRM). Over the next two chapters, we will flesh out our needs for customer-targeted action automation and provide an overview of the strengths and weaknesses of our current 3rd party CRM solution.

Our needs for customer-targeted actions

Here are just a few of our requirements when it comes to customer-targeted actions:

Marketing Communication

Marketing Communication encompasses all proactive outbound communication, ranging from online acquisition performance marketing to customer journey messages and our weekly Promotions Digest email. We want an easy-to-use platform to operate these flows.

Transactional communication

Transactional Communication is all reactive communication we send to our customers that is somehow related to an Order placed, or triggered by a direct action by the customer. It includes 0rder confirmation messages, ETA messages, receipts, password reset emails, and registration confirmations. We need low latency and guarantees on uptime here.

Other customer-targeted nudges

We use the generic term “nudge” to describe everything we can personalize or target towards specific customers that is not direct communication. This could include adding a gift to a customer’s basket to mark their one-year anniversary of shopping with Picnic, showcasing Vegan products on top of a theme page or enabling Direct Debit as a payment method for customers with a sufficient credit score. We need a solution that can combine these nudges with the more traditional communication actions to support powerful campaigns.

Selligent for CRM marketing

A considerable chunk of the demand for Business Process Automation comes from the marketing and personalization space, and our marketing and business teams use Selligent for that. Let’s take a look at our implementation to understand what works for us and where we face difficulties.

What works well for us

A handful of factors make our use of Selligent particularly effective:

Data: Two years ago we implemented event-driven data syncing toward Selligent. Our entire marketing data model is now live up to 1-minute latency, including all segmentation processing.
Easy UI for marketers: The UI to build campaigns is relatively easy to use for less technically skilled marketers. Audience selections can also be made using a Drag-n-Drop UI on top of the data model.
Channels and Nudges: Email is handled by Selligent out of the box, so that’s an SMTP nuisance we don’t have to worry about. Next to that, we’ve set up integrations with a plethora of internal communication sending APIs, so sending an in-app message, for example, is just as easy as sending an email. We also have integrations with other internal APIs for customer-targeted nudges like gifts, feature flags, etc.
Transactional communication: Most of our transactional communication is processed by Selligent. It does, however, rely heavily on external triggering logic and data provisioning, implying some vendor-specific integration logic on our side.

What isn’t working so well

As you can imagine, there are also things we struggle with. We’ve done considerable research in this space by now, and this list seems to be true for most 3rd party CRM vendors.

Schedule-based only: Out-of-the-box campaigns in CRM platforms are schedule-based. Scheduling a campaign to run every 10 minutes doesn’t satisfy real-time execution SLAs and heavily increases the operational load on the system. This means that real-time triggering relies on back-end integration and thus requires development, taking away flexibility and usability for business users.
Limited data retention: Marketing relies on data for audience selection, personalization, and segmentation. However, storage capacity in these CRM platforms is limited and extra data retention often comes with a price tag.
Lack of version control and observability: Marketing CRMs are flexible and agile tools for marketers, but they lack the capabilities that we are used to in engineering regarding version control and observability. This is not a problem for one-off bulk marketing campaigns, but it becomes increasingly challenging when you implement more operationally critical flows.
Customer-targeted actions only: CRM Campaigns start by selecting a subset of the main customer list as an audience and follow a flowchart to execute actions towards these audience members. This flow is pretty solidly baked into these platforms, which makes them less flexible for general-purpose action automation.

In conclusion, CRM marketing applications are brilliant at what they were built for: schedule-based marketing communication automation. But when you try to stretch that to real-time or general-purpose action automation, you will soon hit some brick walls.

Real-time business rules

This is the starting point of the Rule Engine project. We had a growing interest in real-time communication and personalization and we saw a growing number of use cases for more general-purpose action automation:

When a delivery slot is 90% full, we want to notify the customer using a push notification;
When a delivery vehicle is 15 minutes late, we want to notify the customer using an SMS;
When we cannot pick an article due to a stock shortage, then we want to offer a choice of substitutes as quickly as possible;
When a customer has a failed payment attempt, we want to instantly lend a helping hand using an in-app message;
When a delivery slot is almost full, we want to balance its capacity with neighbouring slots;
The list goes on 🙂

Note that all of these examples ultimately trigger some sort of time-critical “action”, but they transcend the standard list of transactional communication since they are more flexible than your static password-reset email, or are not even communication-related.

Also, note that the “triggers” for such flows must come from our back-end systems. Those are the only systems that instantly know that the vehicle is delayed.

Lastly, the “if” and the “how” we want to react to these triggers are mostly a concern of the business teams. Do we send that SMS after a 10- or a 15-minute delay? Would a push message suffice? Business teams interested in customer communication have a huge stake in this. However, the back-end engineer tasked with setting up this flow is probably less interested in such implementation details.

It all came together when we noticed that all these examples can be formulated in a standard form —

“when X, if Y, then do Z”

— and we called them real-time business rules. It was at this moment the idea (or at least the name :/) of the Rule Engine was born.

Project goals

So we want to build a tool to automate business rules. It has to offer way more flexibility than the CRM application, yet be fully built and managed by the business teams interested in operating it.

Great. Now we need to refine this with some functional requirements and organizational goals.

Functional requirements

Real-time automation: For both customer-facing and general-purpose action automation, with high sending SLAs.
Powered by broad data: Decisioning on broad and deep contextual data, enabling personalization and lifecycle segmentation.
Action output control: Supporting all customer targeted and general-purpose APIs with atomic execution for groups of actions.
Rule lifecycle management: Version control for logic, component testing functionality, and the ability to roll back changes.
Real-time monitoring: Instant insight into Rule evaluations, Actions produced, and Actions Executed.
Transparency: Full historic overview of all actions triggered, the rules responsible, and the reasons for triggering them.

Organizational goals

Business owned: Rules built and configured by business analysts — no back-end development required.
Continuous integration: No dependencies on product team release cycles, resulting in faster iterations.
Overview and auditability: One place to see, inspect, and manage all rules.

With these requirements and goals, we hope to tackle the shortcomings identified in our CRM application, whilst enabling business teams to also begin working on the more general automation topics.

Towards an architecture

As stated earlier, conceptually every Rule is a combination of a Trigger that kicks off an evaluation (the “when”), a set of Conditions to check (the “if”), and a conditional set of Actions to execute (the “then”). We need to find candidates to implement these.

Finally, we need a system to Define the rule logic and manage version control, and an interface to Manage the lifecycle of our rules and provide monitoring.

Triggers

For the real-time triggers, we have a good candidate. Picnic runs a microservices architecture, so we already have a fully-fledged event bus implemented in RabbitMQ, where each service communicates relevant status changes through events. It makes sense to trigger evaluations with these events.

Conditions

To check the conditions in our business rule, we need contextual data comparable to the data in our CRM, yet heavily extendible for broader use. Also, we need some DSL to make that data accessible to users and rule evaluations. SQL seems like the natural option here.

Actions

Most of the actions relevant to us are already implemented in the form of REST endpoints exposed by the internal services handling these requests. What rests is to create bindings to each of them and do meaningful input validation.

Defining rule logic

We want to enable our business analysts to define these rules. All of the components above need to be logically glued together by our analysts. We choose to do so in scripting languages that are accessible and familiar enough to them: Javascript and Python.

Our tech teams already use GitHub extensively, including integrations for CI/CD, linting, and component tests. We need to verify to what extent our business users are willing and able to come to the dark side.

Lifecycle management and monitoring

After rule logic is validated, peer-reviewed, and merged into our rules repository, it’s time to push it into operation. For this, we will need a proprietary frontend to enable, disable, and monitor rules. This could be a topic for another blog on its own, so for now we’ll leave it at that 🙂

Enabling our teams to do what they do best

The Rule Engine started out as a conceptual wish from business teams to be able to directly contribute to real-time and general-purpose action automation. It evolved into a project that enables analysts to own their own real-time interaction flows while off-loading back-end teams from setting up and iterating on these kinds of flows.

With the Rule Engine, back-end teams can now focus on what they do best:

Emitting rich data events to signal any relevant status change, and
Exposing endpoints to trigger powerful actions.

While business teams can now focus on what they do best:

Defining logic to translate things that happen into actions, and
Iterating and fine-tuning it, to serve customers better every day.

In the next 2 blogs posts on this topic, my colleague Philip Leonard will elaborate on the technical details of our implementation, so stay tuned!

Want to work with us on awesome projects like the Rule Engine? Check out our current vacancies to learn more and apply!