Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published Apr 22 • 62
Atlas: Multi-Scale Attention Improves Long Context Image Modeling Paper • 2503.12355 • Published Mar 16 • 12
Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping Paper • 2304.08025 • Published Apr 17, 2023 • 2
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models Paper • 2305.13655 • Published May 23, 2023 • 7