LOTUS Makes LLM-Powered Data Processing Fast and Easy

LOTUS is an LLM-powered query engine for processing text, documents, structured and unstructured data with AI.

Get Started in a Few Lines of Code

LOTUS provides an intuitive Python package and familiar Pandas-like API with LLM-powered semantic operators for advanced document processing and data analytics.
Open in Colab

        papers_df.sem_filter("the {research_paper} has an open source repo")
            .sem_topk("the {research_paper} has the most ground-breaking ideas", K=20)
            .sem_agg("summarize the papers based on their {research_paper}")
                      

The Power of Semantic Operators

LOTUS implements the semantic operator model, a powerful and declarative programming model for AI-based document processing and data transformations.

Declarative AI-Based Programming

Specify your data and document processing logic with declarative, high-level LLM-powered operators. Then leave the rest to the query engine!

Highly Optimized LLM Execution

LOTUS automatically optimizes your LLM-powered data processing programs, for up to 400x speedups.

Seamless Data Integration

Plug into your existing database, vector database, or document store. LLM-powered semantic operators seamlessly extend the relational model, making it easy for you to leverage your structured and unstructured document data together.

LLM-Powered Document Processing Use Cases

LOTUS serves a diverse array of applications that need to process documents and data with AI. Here are some examples, each written in short & intuitive LOTUS programs.

Document Fact-Checking

LOTUS LLM-powered document processing programs reproduce and improve upon state-of-the art fact-checking accuracy pipelines on the FEVER dataset, while optimizing execution to acheive 28x speedups.

Document ETL and Classification

LOTUS acheives state-of-the art accuracy with a single semantic operator on the BioDEX dataset, which presents a complex medical document classification task. Under the hood, the LOTUS query engine automatically explores feasible execution plans to achieves 400x faster performance than the default.

Document Search and Ranking

LOTUS LLM-powered programs acheive 200% higher accuracy than state-of-the-art retrieval and re-ranking methods, while also providing query efficiency with up to 10x lower execution time than LM-based methods used by prior works.

Research Document Insights

Simple LOTUS programs process large sets of recent ArXiv papers allows you to provide summaries, and group the data based on topics, answer complex research questions.

Team

The LOTUS project is ongoing work from researchers at Stanford and Berkeley University, developing advanced LLM-powered data processing technology.
Liana Patel - LOTUS LLM Document Processing

Liana Patel

Stanford University Project Lead
Sid Jha - LOTUS LLM Document Processing

Sid Jha

UC Berkeley Core Contributor
Parth Asawa - LOTUS LLM Document Processing

Parth Asawa

UC Berkeley Core Contributor
Melissa Pan - LOTUS LLM Document Processing

Melissa Pan

UC Berkeley Core Contributor
Harshit Gupta - LOTUS LLM Document Processing

Harshit Gupta

Stanford University Core Contributor
Carlos Guestrin - LOTUS LLM Document Processing

Carlos Guestrin

Stanford University Faculty Advisor
Matei Zaharia - LOTUS LLM Document Processing

Matei Zaharia

UC Berkeley Faculty Advisor