Get Started in a Few Lines of Code
LOTUS provides an intuitive Python package and familiar Pandas-like API with LLM-powered semantic operators for advanced document processing and data analytics.
Open in Colab
papers_df.sem_filter("the {research_paper} has an open source repo")
.sem_topk("the {research_paper} has the most ground-breaking ideas", K=20)
.sem_agg("summarize the papers based on their {research_paper}")
The Power of Semantic Operators
LOTUS implements the semantic operator model, a powerful and declarative programming model for AI-based document processing and data transformations.
Declarative AI-Based Programming
Specify your data and document processing logic with declarative, high-level LLM-powered operators. Then leave the rest to the query engine!
Highly Optimized LLM Execution
LOTUS automatically optimizes your LLM-powered data processing programs, for up to 400x speedups.
Seamless Data Integration
Plug into your existing database, vector database, or document store. LLM-powered semantic operators seamlessly extend the relational model, making it easy for you to leverage your structured and unstructured document data together.
LLM-Powered Document Processing Use Cases
LOTUS serves a diverse array of applications that need to process documents and data with AI. Here are some examples, each written in short & intuitive LOTUS programs.
Document Fact-Checking
LOTUS LLM-powered document processing programs reproduce and improve upon state-of-the art fact-checking accuracy pipelines on the FEVER dataset, while optimizing execution to acheive 28x speedups.
Document ETL and Classification
LOTUS acheives state-of-the art accuracy with a single semantic operator on the BioDEX dataset, which presents a complex medical document classification task. Under the hood, the LOTUS query engine automatically explores feasible execution plans to achieves 400x faster performance than the default.
Document Search and Ranking
LOTUS LLM-powered programs acheive 200% higher accuracy than state-of-the-art retrieval and re-ranking methods, while also providing query efficiency with up to 10x lower execution time than LM-based methods used by prior works.
Research Document Insights
Simple LOTUS programs process large sets of recent ArXiv papers allows you to provide summaries, and group the data based on topics, answer complex research questions.
Team
The LOTUS project is ongoing work from researchers at Stanford and Berkeley University, developing advanced LLM-powered data processing technology.

Liana Patel
Stanford University Project Lead
Sid Jha
UC Berkeley Core Contributor
Parth Asawa
UC Berkeley Core Contributor
Melissa Pan
UC Berkeley Core Contributor
Harshit Gupta
Stanford University Core Contributor
Carlos Guestrin
Stanford University Faculty Advisor