prof_pic.jpg

Zijian Wang

AI Research Scientist Manager · LLM Pre-training Data · Meta Superintelligence Labs

I lead the data research team at Meta SuperIntelligence Labs. We build data across pre-training, mid-training, and post-training for frontier Muse models. Meet Muse Spark!

Previously, I built code LLMs at AWS AI Labs (Kiro) and studied at Stanford in the Stanford NLP Group (w/ Chris Potts), the University of Michigan (w/ David Jurgens and Kevyn Collins-Thompson), and Shanghai Jiao Tong University.

I value contributing to the research community. I lead the Deep Learning for Code workshop series (ICLR’23, ICLR’25, NeurIPS’25, ICML’26) and serve as area chair, workshop co-organizer, and tutorial presenter at major venues.

Selected Publications

  1. setlur2026reuse.png
    Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes
    arXiv, 2026
  2. zhuo2025cyberzero.png
    Cyber-Zero: Training Cybersecurity Agents without Runtime
    ICLR, 2026
    Wins the first prize in the CSAW 2025 agentic automated CTF challenge
  3. ding2025gtpo.png
    Empowering Multi-Turn Tool-Integrated Reasoning with Group Turn Policy Optimization
    Yifeng Ding, Hung Le, Songyang Han, Kangrui Ruan, Zhenghui Jin, Varun Kumar, Zijian Wang, and Anoop Deoras
    ACL, 2026
  4. zhuo2024bcb.png
    BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
    The BigCodeBench Team
    ICLR (Oral), 2025
  5. ding2024fewer.png
    Fewer Truncations Improve Language Modeling
    ICML, 2024
    Adopted by leading models like DeepSeek-v3 and GLM-4.5, reported in 机器之心
  6. ding2023cross.png
    CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion
    NeurIPS Datasets and Benchmarks, 2023
    Adopted by DeepSeek-Coder, Qwen2.5-Coder, StarCoder, and Augment Code
  7. ben2022mbxp.png
    Multi-lingual Evaluation of Code Generation Models
    ICLR, 2023
  8. wang2019demographic.png
    Demographic Inference and Representative Population Estimates from Social Media Data
    Zijian Wang, Scott A. Hale, David Adelani, Przemyslaw A. Grabowicz, Timo Hartmann, Fabian Flöck, and David Jurgens
    WWW, 2019
    Best Poster Award (1/324)

Services

Organizer/Program Committee/Reviewer

  • Lead organizer of the Deep Learning for Code (DL4C) workshop at ICLR'23, ICLR'25, NeurIPS'25, and ICML'26
  • Co-organizer of the second LLM4Code workshop at ICSE'25
  • Area Chair of ARR
  • Outstanding Reviewer at ACL'21
  • Current or past reviewer of NeurIPS, ICML, ICLR, ARR/*ACL, COLM, ICWSM, WebSci, AAAI, and many workshops

Teaching Assistant