DataStax takes aim at event driven AI with open source LangStream project
Generative AI more often than not works with static sources of data — but what if an organization wants to benefit from real time streaming data? That’s one of the goals underpinning the new LangStream open source project, led by DataStax.
The LangStream project was quietly soft launched by DataStax on Sept. 13 and the effort has iterated rapidly in the weeks since, with a new release out today that expands integration points to make the technology more useful. LangStream initially only worked with DataStax’s AstraDB database and now it supports a series of vector databases including Milvus as well as Pinecone.
The basic idea behind LangStream is to enable developers to more easily work with streaming data sources (sometimes referred to as data in motion), to help build what are known as event driven architectures. In an event driven architecture, an event, which could be a new data point coming in from a stream, is able to trigger or ‘drive’ another action. Event driven architectures are at the foundation of real time applications as well, enabling applications to benefit from data as it comes into a platform. This allows generative models to take the latest contextual data into account when formulating responses or completing tasks.
“LangStream is a way to build generative AI applications in an event driven way,” Chris Bartholomew, head of streaming engineering at DataStax told VentureBeat in an exclusive interview.
Bartholomew is no stranger to the world of streaming data, having previously been the founder and CEO of streaming data vendor Kesque, which was acquired by DataStax in 2021. Kesque developed technology based on the open source Apache Pulsar streaming data project, which has now become the foundation of the DataStax Astra Streaming service.
How LangStream works to enable event driven Generative AI
As it turns out, LangStream currently doesn’t rely on Apache Pulsar, rather it makes use of the open source Apache Kafka technology which is widely used today for event data streaming.
Bartholomew explained that LangStream uses a standard stream processing model where it takes in messages or events, processes them, and sends them out. LangStream is particularly useful in combination with vector database technologies in support of Retrieval Augmented Generation (RAG) operations where generative AI models are able to cite up-to- date data.
As data is pulled into a model for RAG, each new piece of data needs to have a vector embedding generated so that it can be used in a vector database. With the real time nature of streaming data, there is a need to have embeddings created in a synchronous data pipeline, which is what LangStream aims to enable. Bartholomew noted that LangStream is agnostic about which particular vector embedding model is used and can support multiple models today including open source models hosted on Hugging Face as well as Google’s Vertex AI.
“A lot of what we’re doing is taking the pipeline streaming, event driven paradigm and we’re taking it to GenAI applications,” he said.
The future of LangStream
While it’s still early days for LangStream, the project is moving rapidly and there is lots of potential as the community of users grows.
“LangStream can greatly benefit developers working with generative AI as it helps them to easily build applications and simplifies the process of coordinating data from a variety of sources to enable high-quality prompts for LLMs,” Davor Bonaci, CTO and Executive Vice President of DataStax, told VentureBeat. “This makes it far simpler to build scalable, production-ready, real-world AI applications on a broad range of data types.”
LangStream is being developed as an open source project, which is consistent with how DataStax has worked with other technologies it relies on for its commercial efforts including Apache Pulsar and the Apache Cassandra database.
“DataStax has a long history of working with open source communities,” Bonaci said. “It only seems fitting to contribute to yet another open source project, especially one that is so relevant to developers working with today’s most popular technologies.”