- π Ph.D. in Electrical Engineering from Stanford University.
- π Iβm interested in LLM Inference & Serving, with a focus on Quantization and Parallelism (e.g., Parallel Decoding, Speculative Decoding).
- π± Currently focused on:
- CUDA Kernel Optimization
- Model Deployment & Serving Infrastructure (Paged KV Cache, Continuous Batching)
- Post-training (RLHF, Distillation, Flow-matching)
- π« How to reach me: linglingfan.cnn@gmail.com
- π Pronouns: She/Her
π―
Focusing
This repo doesn't represent LF's affiliations with company and school.
-
Stanford University
- Stanford, California
- https://scholar.google.co.in/citations?hl=en&user=Ft7VbWcAAAAJ
- in/lingling-fan-light-field
Pinned Loading
-
SageAttention
SageAttention PublicForked from thu-ml/SageAttention
Quantized Attention achieves speedup of 2-3x and 3-5x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
Cuda
-
gemini-cli
gemini-cli PublicForked from google-gemini/gemini-cli
An open-source AI agent that brings the power of Gemini directly into your terminal.
TypeScript
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.