Pathway launches data processing engine for real-time AI that can ‘unlearn’

The engine supports unified workflows for batch, streaming data, and LLM applications for increased speed, lower latency, and enhanced ease of use for developers.

  • 1 year ago Posted in

Pathway has launched its data processing engine, which benchmarks have determined to be x90 faster than existing streaming solutions. The platform uniquely unifies workflows for batch and streaming data to enable real-time machine learning and, critically, the ability for machines to ‘learn to forget’.  

 

Until now, it has been nearly impossible for machines to learn and react to changes in real-time like humans. Due to the complexity of designing streaming workflows, intelligent systems are typically trained on static [frozen] data uploads, including large language models like ChatGPT. This means their intelligence is stuck at a moment in time. Unlike humans, machines are not in a continuous state of learning and therefore cannot iteratively ‘unlearn’ any information they were previously taught when it is found to be false, inaccurate, or becomes outdated. 

 

Pathway overcomes this thanks to its unique ability to mix batch and streaming logic in the same workflow. Systems can be continuously trained with new streaming data, with revisions made to certain data points without requiring a full batch data upload. This can be compared to updating the value of one cell within an Excel document, which doesn’t reprocess the whole document, but just the cells dependent on it. This means inaccurate source information can be seamlessly corrected to improve system outputs. 

 

It has traditionally been extremely hard to design efficient systems that combine both batch and streaming workflows. And the situation has become even more complex since a third workflow entered the scene, generative AI, which needs fast and secure learning of context to deliver value. 

 

Most organisations typically design two or more separate systems, which are unable to perform incremental updates to revise preliminary results. This has reduced confidence in machine learning systems and stalled the adoption of enterprise AI among organisations that need to make decisions based on accurate real-time data, such as in manufacturing, financial services and logistics. Bringing together batch and streaming data overcomes this challenge and enables true real-time systems for resource management, observability and monitoring, predictive maintenance, anomaly detection, and strategic decision-making. 

 

Pathway enables a paradigm shift towards real-time data 

 

The Pathway data processing engine is enabling organisations to perform real-time data processing at scale. Existing clients include DB Schenker, which has reduced the time-to-market of anomaly-detection analytics projects from three months to one hour, and La Poste, which enabled a fleet CAPEX reduction of 16%.  

 

Unique capabilities of the Pathway data processing engine supporting this shift to real-time include: 

 

Fastest data processing engine on the market – unified batch and streaming. Capable of processing millions of data points per second, it largely surpasses current reference technologies such as Spark (in both batch and streaming), Kafka Streams, and Flink. Benchmarking of WordCount and PageRank against the above also found that Pathway supports more advanced operations and is up to x90 faster thanks to its maximised throughput and lower latency. The benchmarks were stress tested by the developer community, and are publicly available so the tests can be replicated. A detailed description of the benchmarks is available in the HAL preprint. 

 

Facilitates real-time systems – Pathway allows the seamless transition between existing systems, from batch to real-time and LLM architectures, thanks to real-time machine learning integrating fully into the Enterprise context. 

 

Ease of development – Batch and streaming workflows can be designed with the same code logic in Python, which is then transposed into Rust. This democratises the ability for developers to design streaming workflows, which have typically required a specialist skillset, and enables what have typically been disparate teams within an organisation to come together. Thanks to this, Pathway becomes the lingua franca of all data pipelines – stream, batch and generative AI. 

 

Zuzanna Stamirowska, CEO & Co-Founder of Pathway, comments: “Until now, the complexity of building batch and streaming architectures has resulted in a division between the two approaches. This has slowed the adoption of data streaming for AI systems and fixed their intelligence at a moment in time. But there is a critical need for real-time to optimise processing and to enable AI to unlearn for improved, continuous accuracy. 

 

“That’s why our mission has been to enable real-time data processing, while giving developers a simple experience regardless of whether they work with batch, streaming, or LLM systems. Pathway is truly facilitating the convergence of historical and real-time data for the first time.” 

 

The general launch of the Pathway platform follows the company’s $4.5m per-seed round in December 2022, which was led by CEE VCs Inovo and Market One Capital, with angel investors Lukasz Kaiser, Co-Author of Tensor Flow and informally known as the “T” in ChatGPT, and Roger Crook, the former global CEO of German delivery giant DHL.   

Beacon, NY, Dec 20, 2024– DocuWare unveils its AI-powered Intelligent Document Processing...
85% of IT decision makers surveyed reported progress in their companies’ 2024 AI strategy, with...
Lopitaux joins as global companies embrace GenAI solutions at scale and look to build their own...
Predictive maintenance and forecasting for security and failures will be a growing area for MSPs...
NVIDIA continues to dominate the AI hardware market: powering over 2x the enterprise AI deployments...
Hitachi Vantara survey finds data demands to triple by 2026, highlighting critical role of data...
81% of enterprises plan to increase investments in AI-powered IT operations to accelerate...
Hitachi Vantara survey finds data demands to triple by 2026, highlighting critical role of data...