Job Title:

Research Scientist, Vision Foundation Model

Company: ByteDance

Location: San Jose, CA

Created: 2026-04-19

Job Type: Full Time

Job Description:

About the Team Established in 2023, the ByteDance Seed team is dedicated to pioneering new paths toward artificial general intelligence. We aspire to advance the frontier of intelligence to drive progress for both technology and society. With a long-term vision for the AI sector, the Seed team's research spans MLLM, GenMedia, AI for Science, and Robotics. We maintain a global presence with laboratories and career opportunities across China, Singapore, and the United States. To date, we have launched industry-leading general foundation models and cutting-edge multimodal capabilities. Our technology powers over 50 application scenarios - including Doubao, Jimeng, TRAE, Dola and Dreamnia - and serves enterprise customers through Volcano Engine and BytePlus. Third-party data shows that the Doubao App ranks first in user volume in the Chinese market, while Doubao foundation models lead the industry in average daily token consumption. The Seed Vision team focuses on foundational models for visual generation, developing multimodal generative models, and carrying out leading research and application development to solve fundamental computer vision challenges Responsibilities - Conduct research and development in visual foundation generative models - Develop foundation models to enhance the strategic advantages for ByteDance products Minimum Qualifications - Master's or PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline - Experience in research and practical applications in one or more areas of computer vision and machine learning. - Hands-on coding experience in deep learning frameworks (e.g., PyTorch) and large-scale training experience is preferred. Highly competent in algorithms and programming; Strong coding skills in Python. - Work and collaborate well with team members Preferred Qualifications - Candidates with publications in accredited venues such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, SIGGRAPH or Multimedia, etc. - Experience in solving real-world machine learning technical problems. - Experience in large-scale image and video training is preferred, particularly when it involves extensive work with foundation models.

Apply Now

➤