OpenAI’s new AI image generator pushes the limits in detail and prompt fidelity

Voyager@psychedelia.ink · 11 months ago

OpenAI’s new AI image generator pushes the limits in detail and prompt fidelity

simple@lemm.ee · 11 months ago

That avocado image is insane. I’ve yet to see any image prompt AI get text and composition anywhere near this level. Mindblowing to know there were zero edits. I really want to try this now.

PopOfAfrica@lemmy.world · 11 months ago

Ive already accepted that my graphic design degree is worthless.

2nsfw2furious@lemmynsfw.com · 11 months ago

This is just automatic Photoshop - if all you were doing with graphic design was pasting blond hair onto a brunette, yes, this has really screwed you (or made your job a lot easier). If you’re actually doing any level of design… you’re safe for now

PlexSheep@feddit.de · 11 months ago

closedAI still sucks, even if their (closed!!) Tools are powerful.

AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world · edit-2 11 months ago

I will be convinced when they learn to draw hands correctly, which they seem to boast about here.

Devorlon@lemmy.zip · 11 months ago

Here’s an example image from the article.

https://cdn.arstechnica.net/wp-content/uploads/2023/09/plategirl-980x560.jpg

Chozo@kbin.social · 11 months ago

from the article

Well no wonder they couldn’t find this example.

fmstrat@lemmy.nowsci.com · 11 months ago

For a system where the intent is to read, learn, or be entertained (and kill time), people seem unwilling to do the first to accomplish the latter.

Quicky@lemm.ee · edit-2 11 months ago

Was the prompt “Woman from China”?

Edit: I feel like the nuance of this joke may have been lost on some. Whether or not I read the article is irrelevant, since this was not a genuine question, rather a play on words of the double meaning of “china” as in “A woman from (the country) China” and “A woman (emerging) from china (porcelain)”.

I’ll get my coat.

Chariotwheel@kbin.social · 11 months ago

The prompt is on the picture in the article:

A DALL-E 3 image provided by OpenAI with the prompt: “A middle-aged woman of Asian descent, her dark hair streaked with silver, appears fractured and splintered, intricately embedded within a sea of broken porcelain. The porcelain glistens with splatter paint patterns in a harmonious blend of glossy and matte blues, greens, oranges, and reds, capturing her dance in a surreal juxtaposition of movement and stillness. Her skin tone, a light hue like the porcelain, adds an almost mystical quality to her form.”

Why do we need AI creating text, when nobody is reading?

Quicky@lemm.ee · 11 months ago

Whoosh

Aatube@kbin.social · 11 months ago

You might want to put it all lowercase next time

Quicky@lemm.ee · 11 months ago

The next time I make the same joke?

I reckon I’ll just keep it to myself instead. I already feel ridiculous for having to explain it. Lemmy is harder than real life.

ZILtoid1991@kbin.social · 11 months ago

Making the context window likely helps with stuff, however it still has the issue of “background breaking”.

lloram239@feddit.de · edit-2 10 months ago

Seems to be about 50/50, quite a few good looking hands, but still plenty of crocked fingers with some prompts. I think they might need training on video or 3D models, the structure of hands is probably difficult to figure out just from 2D images.

Tony Bark@pawb.social · edit-2 11 months ago

The reason AI struggles with hands is because real artists struggle with them too.

thbb@kbin.social · edit-2 11 months ago

While there is some truth in this, humans and AI do not make the same type of mistakes with hands.

Humans will rebuild the topological structure of the hand: 5 fingers protruding from a base, and get the proportions wrong…while the topology is credible.

AI will rebuild the image of a hand from the 2d appearance of a hand: a variable number of flesh colored, parallel stripes, and improvise from that.

While both can get it wrong, the errors are not similar.

AutoTL;DR@lemmings.world · 11 months ago

This is the best summary I could come up with:

On Wednesday, OpenAI announced DALL-E 3, the latest version of its AI image synthesis model that features full integration with ChatGPT.

DALL-E 3 renders images by closely following complex descriptions and handling in-image text generation (such as labels and signs), which challenged earlier models.

Judging by the samples provided by OpenAI on its promotional blog, DALL-E 3 appears to be a radically more capable image synthesis model than anything else available in terms of following prompts.

While OpenAI’s examples have been cherry-picked for their effectiveness, they appear to follow the prompt instructions faithfully and convincingly render objects with minimal deformations.

DALL-E 3 also appears to handle text within images in a way that its predecessor couldn’t (some competing models like Stable Diffusion XL and DeepFloyd are getting better at it).

Microsoft’s Bing Chat AI assistant, also built on technology from OpenAI, has been able to generate images in conversation since March.

The original article contains 420 words, the summary contains 151 words. Saved 64%. I’m a bot and I’m open source!

avater@lemmy.world · 11 months ago

always wanted to try it out but no way I’m giving my phone number to them, although I understand their approach to reduce bot accounts.

Dkarma@lemmy.world · 11 months ago

Try stabil diffusion

Echo Dot@feddit.uk · 11 months ago

What a time to be alive!

Voyajer@lemmy.world · 11 months ago

Nice username!