Table of contents
- (논문 요약) AnyMAL; An Efficient and Scalable Any-Modality Augmented Language Model
- (논문 요약) Chameleon; Mixed-Modal Early-Fusion Foundation Models
- (논문 요약) Chart-based Reasoning; Transferring Capabilities from LLMs to VLMs
- (논문 요약) InternLM-Math; Open Math Large Language Models Toward Verifiable Reasoning
- (논문 요약) Janus; Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
- (논문 요약) Mixture-of-Agents Enhances Large Language Model Capabilities
- (논문 요약) Molmo and PixMo; Open Weights and Open Data for State-of-the-Art Multimodal Models
- (논문 요약) Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers
- (논문 요약) Transfusion; Predict the Next Token and Diffuse Images with One Multi-Modal Model
- (논문 요약) What matters when building vision-language models?
- (논문 요약) xGen-MM (BLIP-3); A Family of Open Large Multimodal Models
- (모델 요약) Llama 3.2; Revolutionizing edge AI and vision with open, customizable models
- (모델 요약) Qwen2.5 VL