After finding an unprecedented treasure💰 of soooo many gems💎,I'm creating the biggest megathread in the comments of this post showcasing the full range of capabilities of gpt-4o native image gen while pushing it to its absolute limits🤙🏻🔥
It will depict gpt 4o's capabilities & limitations including:
context-aware images✅
modeling the relationships between text and visual data✅
enabling precise multi-turn based visual/multimodal communication✅
including accurate text rendering ✅
Character,style and geometric consistency ✅🔥
Single prompt/multi prompt world and story expansion ✅🚀💥
Limitations include 👇🏻:
tight cropping of longer images❌
hallucinations in low-context prompts❌
limited editing precision(highlighting regions and turn-based editing can skyrocket the accuracy without a new model iteration)❌
inaccuracies in multilingual text rendering❌
Difficulties with dense information at small text sizes❌
Feel free to contribute your own discoveries to the thread
Now let's begin in the comments 😎🔥🌋🎇💥