vLLM (https://github.com/vllm-project/vllm) is a fast and easy-to-use library for LLM inference and serving, aiming to build the fastest and easiest-to-use open-source LLM inference & serving engine.
In 2022, a three-person team from the University of California, Berkeley, began with a demo project to accelerate the training and inference of OPT-175B. Over the course of two years, they built vLLM, the most popular open-source large model inference acceleration framework globally.
Compared to Hugging Face Transformers, it offers up to 24 times the throughput without any changes to the model architecture. Today, vLLM has surpassed 21.8k stars on GitHub, just a year after its open-source debut in June of last year.
Before the launch of ChatGPT and Facebook's rebranding to Meta, vLLM originated from an automated parallel inference demo project named "Alpa." However, during deployment, the vLLM team found the demo to be slow with low GPU utilization. This made them realize that large language model inference is a problem worthy of attention.
At that time, there was no open-source system for large language model inference optimization. As a result, the team decided to build one from scratch. Facing bottlenecks in GPU memory management, the team, after multiple iterations, proposed a new attention algorithm called PagedAttention based on the classic virtual memory and paging techniques in operating systems. They also built vLLM, a high-throughput distributed LLM serving engine, achieving almost zero waste of KV cache memory.
In June 2023, the founding team released the vLLM open-source code and has been maintaining it ever since. Moving forward, the team plans to use the funds for vLLM's development, testing, and performance enhancement.
Since establishment, ZhenFund has always been a staunch supporter of new technologies and the entrepreneurial spirit. We aim to stand firmly by founders' sides, accompanying them as they lead technological innovation and change the world. In 2022, ZhenFund donated to the open-source project ControlNet.
Yusen Dai, Managing Partner of ZhenFund, said, "Our donation to the open-source project vLLM stems from our commitment to promoting the popularization of AI, hoping that new technologies can benefit as many people as possible. Compared to the industry, excellent work in academia is often more constrained by costs and computing power. We believe that the best way to change the world is to create—if possible, to create together with developers worldwide. For the cornerstone of shaping the future, we are happy to contribute our humble efforts."