As we step into 2024, let’s look back at the transformative year of 2023, which has been a milestone for machine learning and artificial intelligence. The past year showed significant advancements and shifts in the AI landscape, particularly when it comes to Large Language Models (LLMs).
With the rise of tools like ChatGPT and Midjourney, the role of AI in society and its benefits have become extremely clear to everyone. People can now interact with AI in the most familiar way possible: natural language. This made it more tangible and helped many grasp what AI can do.
At the same time, AI was already everywhere. It’s in Google Maps’ navigation, in Spotify’s ‘Discover weekly’, in the weather forecast and whilst you browse on your phone hundreds of algorithms are used to provide the most optimal experience. So it should not come as a surprise that Picnic is also leveraging AI in many areas to build a seamless shopping experience and super-efficient supply chain. All are powered by AI & machine learning.
Advancements in Large Language Models
2023 was the year LLMs became more accessible than ever. With open-source initiatives and the democratization of AI tools, individuals and smaller entities began harnessing the power of sophisticated models. Cost reductions in training and deploying these models have played a crucial role, but it was really ChatGPT that kicked it off by offering a simple interface that everybody can use.
The capabilities of LLMs have seen exponential growth. From answering complex queries to generating human-like text and code, these models have broadened the horizon of what AI can achieve. At the same time, it’s good to realize that the techniques are not new. The underlying Transformer architecture and attention mechanism paper was already published by researchers at Google in 2017. For reference, nowadays the code for training a model like GPT2 only requires 300 lines of Python code.
The launch of ChatGPT, along with the other transformer models, has been made possible by 3 key ingredients:
- The increase of available computing power
- The increase in available data
- A carefully crafted recipe to select reliable data as input
I have been using ChatGPT and other LLMs on a daily basis, and must admit I can’t live without them. It has become an essential tool in my toolbox. From using it for brainstorming, as a coding buddy, or a mentor to reflect, to help finding the right tone of voice: LLMs are definitely a whole new game.
Challenges in deploying Large Language Models in production
With the ChatGPT hype, many companies around the world became bullish on AI and desired having their own GPT, internally or for their customers. Finally, customers can just talk to us about what they want and we’ll just make it happen! I wish it was that easy. With new technologies come new challenges. Running LLMs in production is hard.
One of the critical challenges faced by LLMs is their gullibility, which is commonly referred to as hallucinations, where the models are often leading to the propagation of misinformation.
Depending on the use-case, this can be reduced by
- Reducing the model’s temperature (creativity)
- Fine-tuning a model, using a foundational model (like GPT4, or an open-source alternative) as a basis
- Leveraging Retrieval-Augmented Generation [RAG]
Hallucinations are voiced as the biggest criticism for deploying large language models in production. Answers are posed with full confidence, even when the model generates false statements. For example, comparing which smartphone or car to buy, it may generate features that are likely, but not correct. This is all due to the probabilistic nature of the model.
Secondly, there’s a risk of prompt injection: the ability to ‘hack’ the model, in order to return answers it was not designed for. A clear case of this went viral not too long ago, when someone asked a global delivery company for a poem highlighting their terrible service. Unless you’ve got big teams of ML engineers, one can best focus on internal use-cases, such that a human will remain in the loop. In the case of Picnic, the best customer experience is key, and not something we want to take any chances on. (There is a great read from QuantumBlack on the complexity and the stack required to deploy a knowledge assistant).
Ethical concerns
The use of unlicensed data for training LLMs has sparked a lot of ethical and legal debates. Most primarily with the New York Times suing OpenAI for training on their copyrighted data.
Whilst LLMs have sparked this debate, this is an old question: can machine generated content (written, visual, audio, video) be copyrighted?
Separately, the EU is pushing hard for AI to be regulated, with the EU AI Act agreed upon at the end of 2023 (though not yet active). It primarily focuses on areas with unacceptable risk, such as healthcare, children’s toys, or social scoring. Fortunately, Picnic’s ML models do not need to make life-or-death decisions. Nevertheless, we take privacy, customer data, and other ethical concerns very seriously when developing our models.
Next to critical use-cases, the AI Act also specifies new regulations for machine generated content. We will see the subtext: “Generated by AI” more and more often. Namely, if the content is streamed directly from an LLM or any other model, without any humans in the loop, companies will be required to convey this. In my view, a big step forward for people to get a better grasp on what content is real, and what is machine generated.
A shift in ML development
Companies like OpenAI, Anthropic, Mixtral, StabilityAI, and many more have caused more than a shift in how humans interact with computers. It has also vastly changed how software, and especially ML products, are developed.
Let’s take the example of text classification. Previously, one had to pre-process the data, organize the infrastructure with large GPU nodes, train a model, evaluate, and iterate until performance was acceptable. Now, classification, moderation, or creating embeddings are a mere API call away (read more on a comparison between DistilBERT and OpenAI).
This drastically reduces the time-to-market through using an existing pre-trained model (remember: it’s called Generative Pre-trained Transformer [GPT]). Internal proof-of-concepts can be built in a matter of days or even hours instead of weeks. Entire start-ups are popping up left and right, capitalizing on this new capability.
To me, this is a shift that happened with — for example — the payment industry in the 2000s. Companies like Stripe and Adyen (and similarly, others like Twilio), wrapped APIs around something that used to require complex systems or heavy infrastructure investments.
This time-to-market does come at a price, though. For starters, it’s a complete black-box model. You’ll have to trust that companies like OpenAI make the right trade-offs, have unbiased models, etc. People like Yann LeCun (Chief Scientist at FAIR/Meta) have been big advocates of open source models. Their performance is improving,closely matching what proprietary models offer while allowing for anyone to take a look under the hood. This becomes especially important when considering the privacy of the data you’re processing as data leaks happen.
As this industry is moving fast, and not always very stable, it will be important to not bet on a single horse. With the ecosystem still in flux, utilizing proxies and leveraging (internal) platforms are key to remaining flexible and reducing lock-in effects.
What to expect in 2024
In 2024, Picnic remains keenly focused on the evolving landscape of machine learning and AI. Our teams of analysts, machine learning engineers, and MLOps engineers closely monitor and assess the most promising advancements in the field. This approach ensures that we stay at the forefront of innovation, ready to adopt and integrate breakthroughs that align with our operational goals and enhance our customer experience. We do however not jump on every hype and remain product focussed.
For starters, I expect the myriad of MLOps and LLMOps tools to settle for more standardization and a few clear winners in this space. Key areas are model observability, feature stores, vector databases, and tools to provide LLM guardrails to safely deploy LLMs in production.
Secondly, older concepts of AI agents and multi-agent systems have found a new life again. Tools like LangGraph (from the author of the wildly popular LangChain) and CrewAI are rising in popularity.
Nevertheless, it remains key to build great products. Ultimately, it’s not about tech, but the value you’re adding to your customers. The best AI products are those where AI hides in plain sight.
Join us
If you share our passion and curiosity about how we plan to tackle this evolving challenge, we invite you to be a part of our team as a Machine Learning Engineer. Join us on this venture, contributing to the advancement of ML at Picnic by taking it to unprecedented heights!