1. Assist in researching the model's ability to understand generated content, including parsing semantics, objects, relationships, and spatial structures.
2. Help implement state tracking and evaluate consistency modeling for generated videos.
3. Participate in exploring "unified generation-understanding" model architectures.
Assist in evaluating causal consistency control and physical constraints for "action input → video output" pipelines.
4. Contribute to researching hybrid architectures aimed at achieving low-latency feedback for real-time interactive generation.
5. Work with the team to test and refine the end-to-end closed loop of "generation → understanding → control → feedback."
6. Assist in validating interactive capabilities in simulated/game scenarios and help build evaluation metrics.
7. Track the latest industry academic papers and open-source projects related to video understanding and generation.
1. Currently pursuing a Ph.D. or Master's degree in AI-related fields (video understanding, video prediction, reinforcement learning, or multimodal generation).
2. Good understanding of the internal mechanisms of video generation/VLM models and diffusion model principles.
3. Academic or project experience in interactive video generation, controllable generation, or multimodal understanding.
4. Solid coding and engineering skills; able to assist in building and debugging model training pipelines.
5. Proficient in Python/PyTorch; publications in related fields are a strong plus.
6. Familiarity with Game AI, simulation environments, or reinforcement learning frameworks is preferred.
As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.



