(논문 요약) Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget (Paper)

핵심 내용

  • randomly mask up to 75% of the image patches during training
  • deferred masking strategy: preprocesses all patches using a patch-mixer (few transformer layers), and then mask
  • data: 37M publicly available real and synthetic images
  • model: 1.16B sparse transformer
  • training cost: $1,890 (118× lower cost than stable diffusion models, 14× lower cost than SOTA)

실험 결과

  • cost-effectiveness
  • generation 예시