Bowen Wang (ηŽ‹εšζ–‡)

I am a senior undergraduate student at the Department of Computer Science and Technology, Tsinghua University. Currently, I am visiting the Sky Computing Lab, advised by Professor Ion Stoica.

My research focuses on machine learning systems, especially inference acceleration for large language models.

I was advised by Professor Jie Tang and Professor Yuxiao Dong in the Knowledge Engineering Group at Tsinghua University and Z.ai. I was an alumnus of Stanford Undergraduate Visiting Research in 2024 and was advised by Professor Tsachy Weissman.

CV  /  Email  /  Google Scholar  /  GitHub  /  LinkedIn  /  X

profile photo

Research

project image

Barbarians at the Gate: How AI is Upending Systems Research


Audrey Cheng*, Shu Liu*, Melissa Pan*, Zhifei Li, Bowen Wang, Alex Krentsel, Tian Xia, Mert Cemri, Jongseok Park, Shuo Yang, Jeff Chen, Lakshya Agrawal, Aditya Desai, Jiarong Xing, Koushik Sen, Matei Zaharia, Ion Stoica
arXiv
[paper] [code] [website]

project image

PrefillOnly: An Inference Engine for Prefill-only Workloads in LLM Applications


Kuntai Du, Bowen Wang, Chen Zhang, Yiming Cheng, Qing Lan, Hejian Sang, Yihua Cheng, Jiayi Yao, Xiaoxuan Liu, Yifan Qiao, Ion Stoica, Junchen Jiang
SOSP 2025
[paper] [code]

project image

APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding


Mingdao Liu*, Aohan Zeng*, Bowen Wang, Peng Zhang, Jie Tang, Yuxiao Dong
arXiv
[paper] [code]

project image

AgentTuning: Enabling Generalized Agent Abilities for LLMs


Aohan Zeng*, Mingdao Liu*, Rui Lu*, Bowen Wang, Xiao Liu, Yuxiao Dong, Jie Tang
ACL 2024 Findings
[paper] [code] [website]




Open-Source Projects

project image

vLLM Expert Parallelism Load Balancer (EPLB)


[repo] [pr]

Implemented a load balancer that rearranges the experts dynamically based on observed expert usage to solve the load imbalance problem of sparse Mixture-of-Experts (MoE) inference.

Supports redundant experts to further amortize the load of popular experts by distributing the load of heavy-hitters to more compute resources.

Achieves up to 30% throughput improvement and 25% latency reduction in sparse MoE inference. Core component of large scale expert parallelism inference.




Experience

Sky Computing Lab, University of California, Berkeley
2024.12 - Present

Visiting Student Researcher
Advisor: Prof. Ion Stoica
Undergraduate Visiting Research (UGVR), Stanford University
2024.07 - 2024.08

Research Intern
Advisor: Prof. Tsachy Weissman
Knowledge Engineering Group (KEG), Tsinghua University
2023.07 - 2024.06

Research Intern
Advisor: Prof. Jie Tang & Prof. Yuxiao Dong
Z.ai, Beijing, China
2023.07 - 2024.06

Research Intern, Member of the GLM Training Team



Other Projects

project image

NOP-Processor: Out-of-Order LoongArch Core


Competition, NSCSCC 2023 (LoongArch Track)
[video] [code]

Special Prize (National Top 1), 7th "Loongsun Cup" CPU Design Competition.

NOP-Processor is a high-performance out-of-order processor core for the LoongArch architecture, used as a strong baseline for the National Student System Capability Challenge (NSCSCC 2023).

The project implements a modern pipeline with speculative execution, branch prediction, and an on-chip memory hierarchy, targeting both high frequency and competitive performance on the LoongArch benchmark suite.

project image

Dino Fit Adventure: Chrome Dino with Full-Body Control


Coursework, Digital Logic Design, Tsinghua University
[video] [code]

Dino Fit Adventure lets you play Chrome Dino in the real world using your body movements, built as a digital design course project on FPGA.

The system reads acceleration data from a wearable sensor to detect jumping and other actions, decodes the data in hardware, and renders a smooth VGA game pipeline, handling clock-domain crossing, buffering, and timing closure in SystemVerilog.

project image

Simple RDBMS from Scratch


Coursework, Introduction to Database Management Systems, Tsinghua University
[code]

This project implements a simple relational database management system (RDBMS) from scratch, supporting core features such as CRUD operations, indexing, constraints, aggregation, and join queries.

It focuses on building the end-to-end query pipeline, including a storage layer, execution engine, and basic optimization, to expose the full lifecycle of SQL query processing in a compact system.

project image

Wordle in Rust with WebAssembly


Coursework, Programming Training, Tsinghua University
[website] [code]

This project is an implementation of the Wordle word game in Rust, compiled to WebAssembly so that it can run efficiently in the browser.

It explores ergonomics of Rust for game logic, safe state management, and the toolchain for building, optimizing, and deploying Rust+WASM applications to the web.

Miscellaneous

I have a passion for learning languages – both programming and natural.
Some languages I'm using / learning:
    πŸ‡¨πŸ‡³ δΈ­ζ–‡ δ½ ε₯½οΌ
    πŸ‡ΊπŸ‡Έ English Hello!
    πŸ‡―πŸ‡΅ ζ—₯本θͺž こんにけは!
    πŸ‡°πŸ‡· ν•œκ΅­μ–΄ μ•ˆλ…•ν•˜μ„Έμš”!
    πŸ‡ͺπŸ‡Έ EspaΓ±ol Β‘Hola!
    🐍 Python print('Hello, world!')
    πŸ¦€ Rust println!("Hello, world!");

Design and source code from Jon Barron's website