fariszr a day ago

This is the gpt 4 moment for image editing models. Nano banana aka gemini 2.5 flash is insanely good. It made a 171 elo point jump in lmarena!

Just search nano banana on Twitter to see the crazy results. An example. https://x.com/D_studioproject/status/1958019251178267111

  • qingcharles a day ago

    I've been testing it for several weeks. It can produce results that are truly epic, but it's still a case of rerolling the prompt a dozen times to get an image you can use. It's not God. It's definitely an enormous step though, and totally SOTA.

    • spaceman_2020 a day ago

      If you compare to the amount of effort required in Photoshop to achieve the same results, still a vast improvement

      • qingcharles a day ago

        I work in Photoshop all day, and I 100% agree. Also, I just retried a task that wouldn't work last night on nano-banana and it worked first time on the released model, so I'm wondering if there were some changes to the released version?

        • spaceman_2020 21 hours ago

          We had an exhibition some time back where I used AI to generate the posters for our product. This is a side project and not something we do seriously, but the results were outstanding - better than what the majority of much bigger exhibitors had.

          It took me a LOT of time to get things right, but if I was to get an actual studio to make those images, it would have cost me a thousands of dollars

          • Bombthecat 7 hours ago

            Yeah, played around with it, it created an amazing poster for starfinder ttrpg ( something like DND) with specifies who looked really! Good. Usually stuff likes this fails hard, since there isn't much training data of unique fantasy creatures.

            But flash 2.5? Worked! It did it, crazy stuff

        • Bombthecat 7 hours ago

          How many times did you tried? I uploaded a black and white photo and let it colourize, something like 20 percent were still black and white.

      • petralithic 7 hours ago

        Why would you compare it to Photoshop? If you compare it to other tools in the same category, of image generation, you will find models like Flux and Qwen do much better.

      • echelon a day ago

        Vibe coding might not be real, but vibe graphics design certainly is.

        https://imgur.com/a/internet-DWzJ26B

        Anyone can make images and video now.

        • lebimas a day ago

          What tools did you use to make those videos from the PG image?

          • echelon 21 hours ago

            I used a bunch of models in conjunction:

            - Midjourney (background)

            - Qwen Image (restyle PG)

            - Gemini 2.5 Flash (editing in PG)

            - Gemini 2.5 Flash (adding YC logo)

            - Kling Pro (animation)

            I didn't spend too much time correcting mistakes.

            I used a desktop model aggregation and canvas tool that I wrote [1] to iterate and structure the work. I'll be open sourcing it soon.

            [1] https://getartcraft.com

            • kstenerud 19 hours ago

              The app looks interesting, but I think it needs some documentation. I think I generated something? Maybe? I saw a spinny thing for awhile, but then nothing.

              I couldn't get the 3d thing to do much. I had assets in the scene but I couldn't for the life of me figure out how to use the move, rotate or scale tools. And the people just had their arms pointing outward. Are you supposed to pose them somehow? Maybe I'm supposed to ask the AI to pose them?

              Inpainting I couldn't figure out either... It's for drawing things into an existing image (I think?) but it doesn't seem to do anything other than show a spinny thing for awhile...

              I didn't test the video tool because I don't have a midjourney account.

        • spaceman_2020 21 hours ago

          Midjourney with style references is just about the easiest way right now for an absolute noob to get good aesthetics

        • benreesman 4 hours ago

          I think much like coding, the top of the game is all the old stuff and a bunch of new stuff that is impossible to master without some real math or at least outlier mathematical intuition.

          The old top of the game is available to more people (though mid level people trying to level up now face a headwind in a further decoupling of easily read signals and true taste, making the old way of developing good taste harder).

          This stuff makes people who were already "master rate" who are also nontrivially sophisticated machine learning hobbyists minimum and drives their peak and frontier out, drives break even collaboration overhead down.

          It's always been possible to DIY code or graphic design, it's always been possible to tell the efforts of dabblers and pros apart, and unlike many commodities? There is rarely a "good enough". In software this is because compute is finite and getting more out of it pays huge, uneven returns, in graphic design its because extreme quality work is both aesthetically pleasing as well as a mark of quality (imperfect but a statement someone will commit resources).

          And it's just hard to see it being different in any field. Lawyers? Opposing counsel has the best AI, your lawyer better have it too. Doctors? No amount of health is "enough" (in general).

          I really think HN in particular but to some extent all CNBC-adjacent news (CEO OnlyFans stuff of all categories) completely misses the forest (the gap between intermediate and advanced just skyrocketed) for the trees (space-filling commodity knowledge work just plummeted in price).

          But "commodity knowledge work" was always kind of an oxymoron, David Graeber called such work "bullshit jobs". You kinda need it to run a massive deficit in an over-the-hill neoliberal society, it's part of the " shift from production to consumption" shell game. But it's a very recent, very brief thing that's already looking more than wobbly. Outside of that? Apprentices, journeymen, masters is the model that built the world.

          AI enables a new even more extreme form of mastery, blurs the line between journeyman and dabbler, and makes taking on apprentices a much longer-term investment (one of many reasons the PRC seems poised to enjoy a brief hegemony before demographics do in the Middle Kingdom for good, in China, all the GPUs run Opus, none run GPT-5 or LLaMA Behemoth).

          The thing I really don't get is why CEOs are so excited about this and I really begin to suspect they haven't as a group thought it through (Zuckerberg maybe has, he's offering Tulloch a billion): the kind of CEO that manages a big pile of "bullshit jobs"?

          AI can do most of their job today. Claude Opus 4.1? It sounds like if a mid-range CEO was exhaustively researched and gaff immune. Ditto career machine politicians. AI non practitioner prognosticators. That crowd.

          But the top graphic communications people and CUDA kernel authors? Now they have to master ComfyUI or whatever and the color theory to get anything from it that stands out.

          This is not a democratizing thing. And I cannot see it accruing to the Zuckerberg side of the labor/capital divvy up without a truly durable police state. Zuck offering my old chums nation state salaries is an extreme and likely transitory thing, but we know exactly how software professional economics work when it buckets as "sorcery" and "don't bother": that's 1950 to whenever we mark the start of the nepohacker Altman Era, call it 2015. In that world good hackers can do whatever they want, whenever they want, and the money guys grit their teeth. The non-sorcery bucket has paper mache hack-magnet hackathon projects in it at a fraction of the old price. So disruption, wow.

          Whether that's good or bad is a value judgement I'll save for another blog post (thank you for attending my TED Talk).

        • captnFwiffo a day ago

          Sure, now the client wants 130 edits without losing coherency with the original. What does a vibe designer do? Just keep re-prompting and re-generating until it works? Sounds hard to me.

          • Filligree 7 hours ago

            They use Kontext, Qwen-Edit or Gemini.

    • vitorgrs 13 hours ago

      The model seems good, but it seems to have huge issues in doing garbage most of times lol.

      Still needs more RLHF tuning I guess? As the previous version was even worse.

    • druskacik a day ago

      Is it because the model is not good enough at following the prompt, or because the prompt is unclear?

      Something similar has been the case with text models. People write vague instructions and are dissatisfied when the model does not correctly guess their intentions. With image models it's even harder for model to guess it right without enough details.

      • toddmorey a day ago

        Remember in image editing, the source image itself is a huge part of the prompt, and that's often the source of the ambiguity. The model may clearly understand your prompt to change the color of a shirt, but struggle to understand the boundaries of the shirt. I was just struggling to use AI to edit an image where the model really wanted the hat in the image to be the hair of the person wearing it. My guess for that bias is that it had just been trained on more faces without hats than with them on.

      • qingcharles a day ago

        No, my prompts are very, very clear. It just won't follow them sometimes. Also this model seems to prefer shorter prompts, in my experience.

    • ericlang a day ago

      How did you get early access? Thanks.

      • Thorrez 21 hours ago

        I believe lmarena.

  • hapticmonkey 21 hours ago

    Before AI, people complained that Google was taking world class engineering talent and using it for little more than selling people ads.

    But look at that example. With this new frontier of AI, that world class engineering talent can finally be put to use…for product placement. We’ve come so far.

    • vineyardmike 12 hours ago

      > finally be put to use…for product placement.

      Did you think that Google would just casually allow their business to be disrupted without using the technology to improve the business and also protecting their revenue?

      Both Meta and Google have indicated that they see Generative AI as a way to vertically integrate within the ad space, disrupting marketing teams, copyrighters, and other jobs who monitor or improve ad performance.

      Also FWIW, I would suspect that the majority of Google engineers don't work on an ad system, and probably don't even work on a profitable product line.

    • johnfn 6 hours ago

      Oh come on - you have this incredible technology at your disposal and all you can think to use it for is product placement?

  • torginus a day ago

    Another nitpick - the pink puffer jacket that got edited into the picture is not the same as the one in the reference image - it's very similar but if I were to use this model for product placement, or cared about these sort of details, I'd definitely have issues with this.

    • drmath 16 hours ago

      Even in the just-photoshop-not-ai days product photos had become pretty unreliable as a means of understanding what you're buying. Of course it's much worse now.

      • ethbr1 16 hours ago

        Note: Please understand that monitor may color different. If image does not match product received then kindly your monitor calibration. Seller not responsible. /ebay&amazon

        • wiz21c 12 hours ago

          look at the bottom of the sleeves, they don't match. the bottom of the jacket doesn't match either.

          I didn't see it at first sight but it certainly is not the same jacket. If you use that as an advertisement, people can sue you for lying about the product.

  • dcre a day ago

    Alarming hands on the third one: it can't decide which way they're facing. But Gemini didn't introduce that, it's there in the base image.

    • 725686 a day ago

      Yes, the base image's hands are creepy.

      • meatmanek 20 hours ago

        I noticed the AI pattern on the sunglasses first. I guess all of the source images are AI-generated? In a sense, that makes the result slightly less impressive -- is it going to be as faithful to the original image when the input isn't already a highly likely output for an AI model? Were the input images generated with the same model that's being used to manipulate them?

        • dcre 3 hours ago

          It doesn't seem to matter: people have posted tons of examples on social media of non-AI base images that it was equally able to hold steady while making edits.

  • ceroxylon a day ago

    It seems like every combination of "nano banana" is registered as a domain with their own unique UI for image generation... are these all middle actors playing credit arbitrage using a popular model name?

    • bonoboTP a day ago

      I'd assume they are just fake, take your money and use a different model under the hood. Because they already existed before the public release. I doubt that their backend rolled the dice on LMArena until nano-banana popped up. And that was the only way to use it until today.

      • ceroxylon a day ago

        Agreed, I didn't mean to imply that they were even attempting to run the actual nano banana, even through LMarena.

        There is a whole spectrum of potential sketchiness to explore with these, since I see a few "sign in with Google" buttons that remind me of phishing landing pages.

    • vunderba a day ago

      They're almost all scams. Nano banana AI image generator sites were showing up when this model was still only available in LM Arena.

  • summerlight a day ago

    I wonder how the creative workflow looks like when this kind of models are natively integrated into digital image tools. Imagine fine-grained controls on each layer and their composition with the semantic understanding on the full picture.

  • koakuma-chan a day ago

    Why is it called nano banana?

    • ehsankia a day ago

      Before a model is announced, they use codenames on the arenas. If you look online, you can see people posting about new secret models and people trying to guess whose model it is.

      • mvdtnz a day ago

        What are "the arenas"?

        • patates a day ago

          Blind rating battlegrounds, one is https://lmarena.ai/ (first google result)

          • kstenerud 18 hours ago

            I don't quite get what this is? I asked the AI on the site "What is imarena.ai?" and it just gave some hallucinated answer that made no sense.

            • adventured 17 hours ago

              People vote on the performance of AI, generating ranking boards.

              • kstenerud 13 hours ago

                Ah, that was the missing piece of information! Thanks!

    • Jensson a day ago

      Engineers often have silly project names internally, then some marketing team rewrites the name for public release.

    • ZephyrBlu a day ago

      I'm pretty sure it's because an image of a banana under a microscope generated by the model went super viral

  • rplnt a day ago

    Oh no, even more mis-scaled product images.

  • torginus 21 hours ago

    No, it's not really that much of an improvement. Once you start coming up with specific tasks, it fails just like the others.

  • polishdude20 17 hours ago

    The fingernails on one of them. Ohhh nooo

    • ethbr1 16 hours ago

      Image genai made me realize just how inattentive to detail a lot of people are.

  • goosejuice 16 hours ago

    Yet it's failed spectacularly at almost everything I've given it.

  • 93po a day ago

    Completely agree - I make logos for my github projects for fun, and the last time I tried SOTA image generation for logos, it was consistently ignoring instructions and not doing anything close to what i was asking for. Google's new release today did it near flawlessly, exactly how I wanted it, in a single prompt. A couple more prompts for tweaking (centering it, rotating it slightly) got it perfect. This is awesome.

  • ivape a day ago

    Regardless, it seems Google is on the frontier of every type of model and robotics (cars). It’s nutty how we forget what a intellectual juggernaut they are.

    • fariszr 21 hours ago

      Tool use and sycophancy are still big issues in gemini 2.5 models.

  • r33b33 10 hours ago

    nano banana is good, but not insanely good

  • Viaya 12 hours ago

    [dead]

  • fHr a day ago

    cope

  • echelon a day ago

    > This is the gpt 4 moment for image editing models.

    No it's not.

    We've had rich editing capabilities since gpt-image-1, this is just faster and looks better than the (endearingly? called) "piss filter".

    Flux Kontext, SeedEdit, and Qwen Edit are all also image editing models that are robustly capable. Qwen Edit especially.

    Flux Kontext and Qwen are also possible to fine tune and run locally.

    Qwen (and its video gen sister Wan) are also Apache licensed. It's hard not to cheer Alibaba on given how open they are compared to their competitors.

    We've left the days of Dall-E, Stable Diffusion, and Midjourney of "prompt-only" text to image generation.

    It's also looking like tools like ComfyUI are less and less necessary as those capabilities are moving into the model layer itself.

    • raincole a day ago

      In other words, this is the gpt 4 moment for image editing models.

      Gpt4 isn't "fundamentally different" from gpt3.5. It's just better. That's the exact point the parent commenter was trying to make.

      • jug a day ago

        I'd say it's more like comparing Sonnet 3.5 to Sonnet 4. GPT-4 was a rather fundamental improvement. It jumped to professional applications compared to the only causal use you could use ChatGPT 3.5 for.

      • retinaros a day ago

        did you see the generated pic demis posted on X? it looks like slop from 2 years ago. https://x.com/demishassabis/status/1960355658059891018

        • raincole a day ago

          I've tested it on Google AI Studio since it's available to me (which is just a few hours so take it with a grain of salt). The prompt comprehension is uncannily good.

          My test is going to https://unsplash.com/s/photos/random and pick two random images, send them both and "integrate the subject from the second image into the first image" as the prompt. I think Gemini 2.5 is doing far better than ChatGPT (admittedly ChatGPT was the trailblazer on this path). FluxKontext seems unable to do that at all. Not sure if I were using it wrong, but it always only considers one image at a time for me.

          Edit: Honestly it might not be the 'gpt4 moment." It's better at combining multiple images, but now I don't think it's better at understanding elaborated text prompt than ChatGPT.

          • echelon 17 hours ago

            > FluxKontext

            Flux Kontext is an editing model, but the set of things it can do is incredibly limited. The style of prompting is very bare bones. Qwen (Alibaba) and SeedEdit (ByteDance) are a little better, but they themselves are nowhere near as smart as Gemini 2.5 Flash or gpt-image-1.

            Gemini 2.5 Flash and gpt-image-1 are in a class of their own. Very powerful instructive image editing with the ability to understand multiple reference images.

            > Edit: Honestly it might not be the 'gpt4 moment." It's better at combining multiple images, but now I don't think it's better at understanding elaborated text prompt than ChatGPT.

            Both gpt-image-1 and Gemini 2.5 Flash feel like "Comfy UI in a prompt", but they're still nascent capabilities that get a lot wrong.

            When we get a gpt-image-1 with Midjourney aesthetics, better adherence and latency, then we'll have our "GPT 4" moment. It's coming, but we're not there yet.

            They need to learn more image editing tricks.

    • krackers a day ago

      I'm confused as well, I thought gpt-image could already do most of these things, but I guess the key difference is that gpt-image is not good for single point edits. In terms of "wow" factor it doesn't feel as big as gpt 3->4 though, since it sure _felt_ like models could already do this.

      • echelon a day ago

        People really slept on gpt-image-1 and were too busy making Miyazaki/Ghibli images.

        I feel like most of the people on HN are paying attention to LLMs and missing out on all the crazy stuff happening with images and videos.

        LLMs might be a bubble, but images and video are not. We're going to have entire world simulation in a few years.

    • bsenftner 5 hours ago

      I'm totally with you. Dismayed by all these fanbois.

vunderba a day ago

I've updated the GenAI Image comparison site (which focuses heavily on strict text-to-image prompt adherence) to reflect the new Google Gemini 2.5 Flash model (aka nano-banana).

https://genai-showdown.specr.net

This model gets 8 of the 12 prompts correct and easily comes within striking distance of the best-in-class models Imagen and gpt-image-1 and is a significant upgrade over the old Gemini Flash 2.0 model. The reigning champ, gpt-image-1, only manages to edge out Flash 2.5 on the maze and 9-pointed star.

What's honestly most astonishing to me is how long gpt-image-1 has remained at the top of the class - closing in on half a year which is basically a lifetime in this field. Though fair warning, gpt-image-1 is borderline useless as an "editor" since it almost always changes the whole image instead of doing localized inpainting-style edits like Kontext, Qwen, or Nano-Banana.

Comparison of gpt-image-1, flash, and imagen.

https://genai-showdown.specr.net?models=OPENAI_4O%2CIMAGEN_4...

  • bla3 a day ago

    Why do Hunyuan, OpenAI 4o and Gwen get a pass for the octopus test? They don't cover "each tentacle", just some. And midjourney covers 9 of 8 arms with sock puppets.

    • vunderba a day ago

      Good point. I probably need to adjust the success pass ratios to be a bit stricter, especially as the models get better.

      > midjourney covers 9 of 8 arms with sock puppets.

      Midjourney is shown as a fail so I'm not sure what your point is. And those don't even look remotely close to sock puppets, they resemble stockings at best.

  • MrOrelliOReilly 10 hours ago

    This is incredibly useful! I was manually generating my own model comparisons last night, so great to see this :)

    I will note that, personally, while adherence is a useful measure, it does miss some of the qualitative differences between models. For your "spheron" test for example, you note that "4o absolutely dominated this test," but the image exhibits all the hallmarks of a ChatGPT-generated image that I personally dislike (yellow, with veiny, almost impasto brush strokes). I have stopped using ChatGPT for image generation altogether because I find the style so awful. I wonder what objective measures one could track for "style"?

    It reminders be a bit of ChatGPT vs Claude for software development... Regardless of how each scores on benchmarks, Claude has been a clear winner in terms of actual results.

  • bn-l a day ago

    You need a separate benchmark for editing of course

  • mrcwinn 3 hours ago

    I really enjoyed reviewing this! Good work.

  • gundmc a day ago

    > Though fair warning, gpt-image-1 is borderline useless as an "editor" since it almost always changes the whole image instead of doing localized inpainting-style edits like Kontext, Qwen, or Nano-Banana.

    Came into this thread looking for this post. It's a great way to compare prompt adherence across models. Have you considered adding editing capabilities in a similar way given the recent trend of inpainting-style prompting?

    • vunderba a day ago

      Adding a separate section for image editing capabilities is a great idea.

      I've done some experimentation with Qwen and Kontext and been pretty impressed, but it would be nice to see some side by sides now that we have essentially three models that are capable of highly localized in-painting without affecting the rest of the image.

      https://mordenstar.com/blog/edits-with-kontext

  • cubefox 19 hours ago

    What's interesting is that Imagen 4 and Gemini 2.5 Flash Image look suspiciously similar in several of these tests cases. Maybe Gemini 2.5 Flash first calls Imagen in the background to get a detailed baseline image (diffusion models are good at this) and then Gemini edits the resulting image for better prompt adherence.

    • pkach 7 hours ago

      Yes, saw on a reddit about an employee confirming this is the case (at least on Gemini app) where the request for an image from scratch is routed to imagen and the follow-up edits are done using Gemini.

  • jay_kyburz 20 hours ago

    I really like your site.

    Do you know of any similar sites that that compares how well the various models can adhere to a style guide? Perhaps you could add this?

    I.e. pride the model with a collection of drawings in a single style, then follow prompts and generate images in the same style?

    For example if you wanted to illustrate a book, and have all the illustrations look like they were from the same artists.

carlosbaraza 21 hours ago

Unfortunately, it suffers from the same safetyism than other many releases. Half of the prompts get rejected. How can you have character consistency if the model is forbidden from editing any human. And most of my photo editing involves humans, so basically this is just a useless product. I get that Google doesn't want to be responsible for deep fake advances, but that seems inevitable, so this is just slightly delaying progress. Eventually we will have to face it and allow for society to adapt.

This trend of tools that point a finger at you and set guardrails is quite frustrating. We might need a new OSS movement to regain our freedom.

  • Workaccount2 20 hours ago

    I have an old photo of my girlfriend with her cousin when they were young, wearing Christmas dresses in front of the tree, not long before they were separated to other sides of the world for decades now. The photo is itself low quality on top of the photo itself being physically beat up.

    So far no model is willing to clean it up :/

    • boppo1 3 hours ago

      If you are not personally offended by looking at CRAZY pornography, you could start digging into the comfyui ecosystem. It's not all porn, there are lots of pro photo-manipulators doing sfw stuff, but the community overlap with NSFW is basically borderless, so you'll probably bump into it.

      However, the results the comfyui people get are lightyears ahead of any oneshot-prompt model. Either you can find someone to do cleanup for you (should be trivial, I wouldn't pay more than $10-15) or if you have good specs for inference you could learn to do it yourself.

    • gaudystead 19 hours ago

      There are reddit communities (I admittedly don't remember which, but could probably be found from a simple search) where people will offer their photo editing skills to touch up the photo, often for free. Could be worth trying a real human if the robots are going full HAL 9000 and telling you they can't do it.

    • AuryGlenz 14 hours ago

      If you have a decent GPU Qwen Edit can probably do it and certainly won’t refuse.

      Keep in mind no editing model is magic and if the pixels just aren’t there for their faces it’s essentially going to be making stuff up.

    • yfontana 10 hours ago

      Open source models like Flux Kontext or Qwen image edit wouldn't refuse, but you need to either have a sufficiently strong GPU or get one in the cloud (not difficult nor expensive with services like runpod), then set up your own processing pipeline (again, not too difficult if you use ComfyUI). Results won't be SOTA, but they shouldn't be too far off.

  • danpalmer 14 hours ago

    I've done ~20 prompts so far and not had one be rejected so far. What sort of things are you asking it to do? I've tried things like changing clothing and accessories on people.

    • carlosbaraza 11 hours ago

      Basic things like: "{uploaded image of a man} can you remove the glasses?" or "make everyone in the picture smile" or "open the eyes of everyone in the photo". Nothing that a human would consider "unsafe". I am based in EU and using Google AI Studio with all safety toggles set to "Off".

      • simedw 9 hours ago

        I noticed that I get far fewer refusals when I set my VPN to the USA.

  • mudkipdev 20 hours ago

    I was using Veo two days ago when video generations were free. I removed all words that sounded even remotely bad, but it still refused. Eventually gave up but now I'm thinking it's because I tried to generate myself

minimaxir 20 hours ago

There is one thing Gemini 2.5 Flash Image can do that no other edit model can do: incorporate multiple images simultaneously without shenanigans due to its multimodality, e.g. for Flux Kontext, if you want to "put the person in the first image into the second image", you have to concatenate them pre-VAE which can be unwieldly, but this model doesn't have that issue. You can even incorporate more than two images, but that may cause too much chaos.

In quick testing, prompt adherence does appear to be much better for massive prompt and the syntatic sugar does appear to be more effective. And there are other tricks not covered which I suspect may allow more control, but I'm still testing.

Given that generations are at the same price as its competitors, this model will shake things up.

  • blinding-streak 20 hours ago

    I very much enjoy this feature. My next door neighbor is on vacation, and I'm feeding his fish for him. I took a picture of the fish tank and asked Gemini to put the fish tank at various local tourist attractions in my city, as if we're going on day trips.

    I send him one photo a day and he's been loving it. Just a fun little thing to put a smile on his face (and mine).

    • AuryGlenz 14 hours ago

      Fun fact - I trained a lora on our almost-toddler at the time on SDXL and generated images of her doing dangerous things to send to my wife the first day she had a trip away from us.

      It was all fun and games until the little shit crawled out of our doggy door for the first and only time when I was going to the bathroom. As I was looking for her I got a notification we were in a tornado warning.

      Luckily the dog knew where she had gone and led me to her, having crawled down our (3 step) deck, across our yard, and was standing looking up at the angry clouds.

  • ojr 16 hours ago

    it can't put two images of people together in one photo, this model still has the issue, also, I have seen cases where Flux Kontext works better in things like removing objects

  • dsrtslnd23 12 hours ago

    gpt-image-1 works with multiple input images. I even had good success with >4 images.

atleastoptimal 17 hours ago

I can imagine an automated blackmail bot that scrapes image, video, voice samples from anyone with the most meagre online presence, which then creates high resolution videos of that person doing the most horrid acts, then threatening to share those videos with that person's family, friends and business contacts unless they are paid $5000 in a cryptocurrency to an anonymous address.

And further, I can imagine some person actually having such footage of themselves being threatened to be released, then using the former narrative as a cover story were it to be released. Is there anything preventing AI generated images, video, etc from being always detectible by software that can intuit if something is AI? what if random noise is added, would the "Is AI" signal persist just as much as the indication to human that the footage seems real?

  • shibeprime 16 hours ago

    I’m more bullish on cryptographic receipts than on AI detectors. Capture signing (C2PA) plus an identity bind could give verifiable origin. The hard parts, in my view, are adoption and platform plumbing.

    If we have a trust worthy way to verify proof-of-human made content than anything missing those creds would be red flags.

    https://iptc.org/news/googles-pixel-10-phone-supports-c2pa-u...

    • arsome 3 hours ago

      This seems absolutely silly, it's not hard to take a photo of a photo and there's both analog (building a lightbox) and digital (modifying the sensor input) means which would make this entirely trivial to spoof.

  • goosejuice 16 hours ago

    SynthID claims to be designed to persist through several methods of modification. I suspect such attacks you mention will happen, but by those with deep pockets. Like a nation-state actor with access to models that don't produce watermarks.

  • UltraSane 14 hours ago

    But these new amazing AI image generators lets you just say "It wasn't me, it is an AI fake". Long term they will seriously devalue blackmail material.

    I read a scifi novel where they invented a wormhole that only light could pass through but it could be used as a camera that could go anywhere and eventually anytime and there was absolutely no way to block it. So some people adapted to this fact by not wearing clothes anymore.

    • SirFredman 11 hours ago

      The light of other days, by Arthur C. Clarke and Stephen Baxter. Really cool book.

    • Revisional_Sin 12 hours ago

      > So some people adapted to this fact by not wearing clothes anymore.

      Erm... What?

      • UltraSane 6 hours ago

        Because anyone could use the wormhole camera to see anyone naked. It made modesty effectively impossible.

  • m3kw9 3 hours ago

    If you are willing to pay once, you will be re-targeted. Just like Facebook ads

notsylver a day ago

I digitised our family photos but a lot of them were damaged (shifted colours, spills, fingerprints on film, spots) that are difficult to correct for so many images. I've been waiting for image gen to catch up enough to be able to repair them all in bulk without changing details, especially faces. This looks very good at restoring images without altering details or adding them where they are missing, so it might finally be time.

  • Almondsetat a day ago

    All of the defects you have listed can be automatically fixed by using a film scanner with ICE and a software that automatically performs the scan and the restoration like Vuescan. Feeding hundreds (thousands?) of photos to an experimental proprietary cloud AI that will give you back subpar compressed pictures with who knows how many strange artifacts seems unnecessary

    • notsylver a day ago

      I scanned everything into 48-bit RAW and treat those as the originals, including the IR scan for ICE and a lower quality scan of the metadata. The problem is sharing them - important images I manually repair and export as JPEG which is time consuming (15-30 minutes per image, there are about 14000 total) so if its "generic family gathering picture #8228" I would rather let AI repair it, assuming it doesn't butcher faces and other important details. Until then I made a script that exports the raws with basic cropping and colour correction but it can't fix the colours which is the biggest issue.

      • wingworks a day ago

        How did you get the 49bit and ICE data separately? Did you double scan everything?

        I'm scanning my parents photos at the moment.

      • exe34 a day ago

        this reminds me of a joke we used to tell as kids when there was a new Photoshop version coming out - "this one will remove the cow from the picture and we'll finally see what great-grandpa looked like!"

    • wingworks a day ago

      Vuescan is terrible. SilverFast has better defaults. But nothing beats the orig Nikon scan software when using ICE. It does a great job of removing dust, fingerprints etc Even when you zoom in. VS what iSRD does in SilverFast, which if you zoom in and compare the 2. iSRD kinda smooches/blurs the infrared defects whereas Nikon Scan clones the surrounding parts, which usually looks very good when zooming in.

      Both Silverfast and Nikon Scan methods look great when zoomed out. I never tried Vuescan's infrared option. I just felt the positive colors it produced looks wrong/"dead".

  • bjackman a day ago

    I don't really understand the point of this usecase. Like, can't you also imagine what the photos might look like without the damage? Same with AI upscaling in phone cameras... if I want a hypothetical idea of what something in the distance might look like, I can just... imagine it?

    I think we will eventually have AI based tools that are just doing what a skilled human user would do in Photoshop, via tool-use. This would make sense to me. But just having AI generate a new image with imagined details just seems like waste of time.

    • gretch 2 hours ago

      If you want 2 people to look at the same photo and share the same experience, you have to fix the photo.

      If you leave to imagination, it's likely they each imagine something different.

    • bibabaloo 19 hours ago

      Why take photos at all if you can just imagine them?

      • bjackman 12 hours ago

        Well, that goes to the heart of my point. I take pictures because I value how literal they are. I enjoy the fact that they directly capture the arrangement of light in the moment I took them. That

        So yeah, if I'm gonna then upscale them or "repair" them using generative AI, then it's a bit pointless to take them in the first place.

    • w4yai a day ago

      Not everyone has a great imagination.

  • zwog a day ago

    Do you happen to know some software to repair/improve video files? I'm in the process of digitalizing a couple of Video 2000 and VHS casettes of childhood memories of my mom who start suffering from dementia. I have a pretty streamlined setup for digitalizing the videos but I'd like to improve the quality a bit.

    • nycdatasci a day ago

      I've used products from topazlabs.com for the same problem and have generally been happy with them.

      • qingcharles a day ago

        Topaz is probably the SOTA in video restoration, but it can definitely fuck shit up. Use carefully and sparingly and check all the output for weird AI glitches.

    • notsylver a day ago

      I didn't do any videos, just pictures, but considering how little I found for pictures I doubt you'll find much

  • Barbing a day ago

    Hope it works well for you!

    In my eyes, one specific example they show (“Prompt: Restore photo”) deeply AI-ifies the woman’s face. Sure it’ll improve over time of course.

    • notsylver a day ago

      I tried a dozen or so images. For some it definitely failed (altering details, leaving damage behind, needing a second attempt to get a better result) but on others it did great. With a human in the loop approving the AI version or marking it for manual correction I think it would save a lot of time.

      This is the first image I tried:

      https://i.imgur.com/MXgthty.jpeg (before)

      https://i.imgur.com/Y5lGcnx.png (after)

      Sure, I could manually correct that quite easily and would do a better job, but that image is not important to us, it would just be nicer to have it than not.

      I'll probably wait for the next version of this model before committing to doing it, but its exciting that we're almost there.

      • qingcharles a day ago

        Being pragmatic, the after is a good restoration. There is nothing really lost (except some sharpness that could be put back). The main failing of AI is on faces because our brains are so hardwired to see any changes or weirdness. This is the sort of image that is perfect for AI because the subject's face is already occluded.

    • indigodaddy a day ago

      Another question/concern for me: if I restore an old picture of my Gramma, will my Gramma (or a Gramma that looks strikingly similar) ever pop up on other people's "give me a random Gramma" prompts?

      • Barbing 4 hours ago

        It might show her for prompts of “show me the world’s best grandma” :)

        On free tier, I’d essentially believe that to be the default behavior. In reality they might simply use your feedback and your text prompts instead. Certainly know free Google/OpenAI LLM usage entails prompts being used for research.

        Edit: decent chance it would NOT directly integrate grandma into its training, but would try hard to use an offline model for any privacy concerns

  • reaperducer a day ago

    I've been waiting for image gen to catch up enough to be able to repair them all in bulk without changing details, especially faces.

    I've been waiting for that, too. But I'm also not interesting in feeding my entire extended family's visual history into Google for it to monetize. It's wrong for me to violate their privacy that way, and also creepy to me.

    Am I correct to worry that any pictures I send into this system will be used for "training?" Is my concern overblown, or should I keep waiting for AI on local hardware to get better?

    • Zopieux 21 hours ago

      You're looking for Flux Kontext, a model you can run yourself offline on a high end consumer GPU. Performance and accuracy are okay, not groundbreaking, but probably enough for many needs.

crustaceansoup a day ago

I tried to reproduce the fork/spaghetti example and the fashion bubble example, and neither looks anything like what they present. The outputs are very consistent, too. I am copying/pasting the images out of the advertisement page so they may be lower resolution than the original inputs, but otherwise I'm using the same prompts and getting a wildly different result.

It does look like I'm using the new model, though. I'm getting image editing results that are well beyond what the old stuff was capable of.

  • mortenjorck a day ago

    The output consistency is interesting. I just went through half a dozen generations of my standard image model challenge, (to date I have yet to see a model that can render piano keyboard octaves correctly, and Gemini 2.5 Flash Image is no different in that regard), and as best I can tell, there are no changes at all between successive attempts: https://g.co/gemini/share/a0e1e264b5e9

    This is in stark contrast to ChatGPT, where an edit prompt typically yields both requested and unrequested changes to the image; here it seems to be neither.

    • BoorishBears a day ago

      Flash 2.0 Image had the same issue: it does better than gpt-image for maintaining consistency in edits, but that also introduces a gap where sometimes it gets "locked in" on a particular reference image and will struggle to make changes to it.

      In some cases you'll pass in multiple images + a prompt and get back something that's almost visually indistinguishable from just one of the images and nothing from the prompt.

  • crustaceansoup a day ago

    Wildly different and subjectively less "presentable", to be clear. The fashion bubble just generates a vague bubble shape with the subject inside it instead of the"subject flying through the sky inside a bubble" presented on the site. The other case just adds the fork to the bowl of spaghetti. Both are reproducible.

    Arguably they follow the prompt better than what Google is showing off, but at the same time look less impressive.

skybrian a day ago

Like most image generators, it didn’t pass the piano keyboard test. (Black keys are wrong.)

https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

  • joombaga a day ago

    What is the piano keyboard test? Your link requires granting AI Studio access to Google Drive, which I do not want to do.

    • raincole a day ago

      Just ask it to generate a correct piano keyboard. It's something the current gen of image generator AIs fail at.

      • ZiiS a day ago

        Do most humans pass?

        • raincole a day ago

          Most humans fail at 4 digits multiplication, or drawing a cube in perspective.

        • phainopepla2 a day ago

          Presumably most humans with a camera do

        • adzm a day ago

          2-2-1-2-2-2-1

          • polynomial a day ago

            I still feel like most humans would fail, haha.

            • twodave 16 hours ago

              Maybe, but anyone who knows what a chromatic scale is should be able to reason it out. E# == F, B# == C, so no black keys between those.

  • pbhjpbhj a day ago

    Are their models that have vector space that includes ideas, not just words/media but not entirely corporeal aspects?

    So when generating a video of someone playing a keyboard the model would incorporate the idea of repeating groups of 8 tones, which is a fixed ideational aspect which might not be strongly represented in words adjacent to "piano".

    It seems like models need help with knowing what should be static, or homomorphic, across or within images associated with the same word vectors and that words alone don't provide a strong enough basis [*1] for this.

    *1 - it's so hard to find non-conflicting words, obviously I don't mean basis as in basis vectors, though there is some weak analogy.

    • heyjamesknight a day ago

      How would you encode those ideas?

      • pbhjpbhj 5 hours ago

        I don't know, in part that's why I asked ... I wonder if there's a way to provide a loosely-defined space.

        Perhaps it's a second word-vector space that allows context defined associations? Maybe it just needs tighter association of piano_keyboard with 8-step_repetition??

  • mikepurvis a day ago

    Interesting! I feel like that's maybe similar to the business of being able to correctly generate images of text— it looks like the idea of a keyboard to a non-musician, but is immediately wrong to someone who is actually familiar with it at all.

    I wonder if the bot is forced to generate something new— certainly for a prompt like that it would be acceptable to just pick the first result off a google image search and be like "there, there's your picture of a piano keyboard".

  • vunderba a day ago

    Anything that is heavily periodic can definitely trip up image gen - that being I just used Flux Kontext T2I and got a got pretty close (disregard the hammers though since thats a right mess). Only towards the upper register did it start to make mistakes.

    https://imgur.com/a/fyX42my

  • psbp a day ago

    Doesn't pass the analog clock test either.

  • conception 21 hours ago

    Failed my horizontal text test as well.

  • carimura a day ago

    or my "hands with palms facing down" test.... no matter how hard I try it just can't get open hands, palms down.

    • pbhjpbhj a day ago

      I guess the vast majority of images have the palms the other way, that this biases the output. It's like how we misinterpret images to generate optical illusions, because we're expecting valid 3D structures (Escher's staircases, say).

      • vunderba a day ago

        Yes - it's the same reason generating a 5-leaf clover fails - massive amounts of training data that predisposes the model against it.

  • cubefox a day ago

    Like most image models, except GPT-4o, it also didn't pass the wooden Penrose triangle test. (It creates normal triangles.)

torginus a day ago

A bit mixed opinions - I tried colorizing manga pages with it, and the results were perfect.

Interestingly, it can change pages with tons of text on them without any problem, but cannot seem to do translation, if I ask it to translate a French comic page, the text ends up garbled (even though it can perfectly read and translate the text by itself).

I tried with another page, and it copypasted the same character (in different poses!) all over the panels. Absolutely uncanny!

However when I asked to remake a Western comic book in a manga style (provided a very similar manga page to the comic one), it totally failed.

Also about 50% of the time, it just tells me it'll generate the image but doesn't actually do it - not sure what's going on but a retry fixes it, but it's annoying.

  • anyg 8 hours ago

    I had a similar experience.

    It did not change the text on a hat (ended up changing 1 of 3 words).

    On one occasion it regenerated the same image again, ignoring my instructions to edit.

    I get the feeling that this model is optimised for images with people in it than objects or drawings etc

matsemann a day ago

Half the time I ask Gemini to generate some image it claims it doesn't have the capability. And in general I've felt it's so hard to actually use the features Google announce? Like, a third of them is in one product, some in another which I can't use, and no idea what or where I should pay to get access. So confusing.

  • IanCal 11 hours ago

    Google have been terrible at every single rollout I’ve ever seen them do.

    I see an announcement and it’s a waitlist. It says I can use it right now and I get a 404, or a waitlist, or it doesn’t work in my country. With the AI stuff more often it takes me to a place where I can do something but not what they say, and have zero information about whether I’m using the new thing or not.

    Like this is flash image preview, but I have flash which is also a thing so is it the new one or not? The ui hasn’t changed but now it can do pictures so has my flash model moved from a GA model to a preview one? Probably! Or maybe it gets routed? Who knows!

  • Al-Khwarizmi a day ago

    Yeah, in fact the website says "Try it in Gemini" and I'm not sure if I'm already trying it or not - if I choose Gemini 2.5 Flash in the regular Gemini UI, I'm using this?

    • throwup238 a day ago

      It’s going to be a messy rollout as usual. The web app (gemini.google.com) shows “Images with Imagen” for me under tools for 2.5 flash but I just tried a few image edits and remixes in the iOS app and it looks like it’s been updated to this model.

    • oliwary a day ago

      Also very confused at this... It told me "I'm unable to create images of specific individuals in different settings." I wish it would at least say somewhere which model we are using at the moment.

    • sega_sai a day ago

      I think not. Because at least in the aistudio there is a dedicated gemini-2.5-flash-image-preview model. So I am assuming it is not available in the standard gemini chat window.

  • jeffbee a day ago

    It's not in the Gemini app or site at all. You have to use AI Studio or another means. Yes, this is all very confusing on Google's part.

    • IanCal 11 hours ago

      Hmm could the old models generate images before? Had they hooked up imagen or something? I can make images on the Gemini site.

      • bonoboTP 7 hours ago

        Yes, there was gemini-2.0-flash-preview-image-generation before, which could generate and edit also. But weaker than the new one.

        • IanCal 6 hours ago

          Thanks, I'd not realised that, which means I have no idea if the things I've done outside of the API are this new one or not. That does feel classic google.

          • bonoboTP 3 hours ago

            Yes, there's a conflict between wanting to just provide the good stuff by default under a unified Gemini brand where you don't have to worry about model names, it just works, versus building hype for a specific model and then being unclear about whether you're using that one or not. The nano-banana name is unique and fun, and got some recognition on social media already, they should just make a page with that heading and a chatbox. But again, that would focus on the new image editor thing only, and they probably want to lure people into their whole ecosystem, to switch to Gemini in general, from competitors like ChatGPT.

__rito__ a day ago

I am glad that I never decided to become a photoshop pro. I always contemplated about it, seemed attractive for a while, but glad that I decided against it. RIP r/photoshopbattles.

It was in the endless list of new shiny 'skills' that feels good to have. Now I can use nano-banana instead. Other models will soon follow, I am sure.

  • esafak a day ago

    Retouching is an art. To the pro, this is just another tool to increase efficiency. You pay them not just for knowing how to use Photoshop, but for exercising good judgement. That said, I imagine this will shrink the field, since fewer retouchers will be able to do the same work, unless the amount of work goes up commensurately. Will people get more retouching done if the price goes down? Not sure.

    • __rito__ 2 hours ago

      I didn't say Lightroom and said Photoshop and mentioned that subreddit for a reason.

    • neom a day ago

      Especially colouring, In college I worked for a dude who would re-colour old B&Ws for people, 60% the work (the work he enjoyed) was trying to research enough to know reasonably well what colour something actually ought to be, not just what we thought looked good.

  • ctippett a day ago

    Interesting take. I'm a programmer, but learned Photoshop in the early 2000s and had a blast making and editing images for fun. Sure, the generative models today can do a far better job than anything I could come up with, but that doesn't detract from the experience and skills I picked up over the years.

    If anything, knowing Photoshop (I use Affinity Designer/Photo these days) is actually incredibly useful to finesse the output produced by AI. No regrets.

    • __rito__ an hour ago

      > learned Photoshop in the early 2000s and had a blast making and editing images for fun

      > "had a blast"

      One can have blasts in many things nowadays. Like playing Factorio, writing functional code for recreational problem solving, playing Chess, making SBC/Microprocessor projects for fun, doing Math for fun, and so on...

      Photoshop just couldn’t compete with the existing blasts in my life, and I felt a little bad for not learning it. But that teeny, tiny bad feeling has been wiped away by nano-banana.

    • polynomial a day ago

      Photoshop was hella fun, turned out that programming paid more. And now AI pays much more.

  • SoKamil a day ago

    If you commented it a decade ago, I would say that at least you own the program and skills in case Google decides to turn off the lights or ask prohibitive price tag. Now you need to pay subscription for PS and maybe there would be some decent open weight model released.

    • stefs a day ago

      qwen3 is open weights and offers passable image generation

  • CuriouslyC a day ago

    it's still a useful skill to know photoshop. AI images can be great but you are almost always going to want to A. create the base composition yourself B. clean up artifacts in the AI generation and C. layer AI compositions into a final work.

  • echelon a day ago

    Programming and everything else will eventually fall to automation, too. It's just a matter of time.

    Engineering probably takes a while (5 years? 10 years?) because errors multiply and technical debt stacks up.

    In images, that's not so much of a big deal. You can re-roll. The context and consequences are small. In programs, bad code leads to an unmaintainable mess and you're stuck with it.

    But eventually this will catch up with us too.

    • quantumHazer a day ago

      Both of you are wrong and this is not good discussion level for HN

      • casey2 a day ago

        If being wrong isn't good discussion for HN then they should delete the site

      • echelon a day ago

        I'm unclear as to which side of the argument you're taking.

        If you think that these tools don't automate most existing graphics design work, you're gravely mistaken.

        The question is whether this increases the amount of work to be done because more people suddenly need these skills. I'm of the opinion that this does in fact increase demand. Suddenly your mom and pop plumbing business will want Hollywood level VFX for their ads, and that's just the start.

adidoit a day ago

Very impressive.

I have to say while I'm deeply impressed by these text to image models, there's a part of me that's also wary of their impact. Just look at the comments beneath the average Facebook post.

  • postalcoder a day ago

    I have been testing google's SynthID for images and while it isn't perfect, it is very good, insofar that I felt some relief from that same creeping dread over what these images will do to perceived reality.

    It survives a lot of transformation like compression, cropping, and resizing. It even survives over alterations like color filtering and overpainting.

    • sigmar a day ago

      facebook isn't going to implement detection though. Many (if not most) of the viral pictures are AI-generated. and facebook is incentivized to let their users get fooled to generate endless scrolling

      • qingcharles a day ago

        They already did. Certainly on the backend. For a while they were surfacing it, but I think it's gone again. But Meta is definitely onto this.

      • paul7986 a day ago

        Along with those being fooled there are many comments saying this is fake, AI trash and etc. That portion of the commenters are teaching the ignorant and soon no one will believe what they see on the Internet as real.

        • bonsai_bar a day ago

          > soon no one will believe what they see on the Internet as real.

          Now is that so bad?

  • betterhealth12 15 hours ago

    I think it's time to build a new system - something that can annotate the post the user is on, if there's at least another savvy user (or AI system) that can pick up on the uncanny signals. This youtube video about the "Walker Family" sham on Facebook is particularly relevant here:

    Don’t Pay This AI Family To Write You a Song - https://www.youtube.com/watch?v=u-DDHSfBBeo

  • MitPitt a day ago

    Facebook comments are obviously botted too

    • bee_rider a day ago

      I dunno, I thought so for a while, but I’m beginning to suspect this is a very optimistic view of humanity.

      • artursapek 8 hours ago

        Why do HN commenters all act like they're in the top 1% of intellectuals

  • nikanj a day ago

    The comments are probably AI-generated too, because a site that seems to have lots of other people on it is more appealing than an empty wasteland

  • knicholes a day ago

    [flagged]

    • yifanl a day ago

      This presumes that you're okay with giving the real Elon your wallet but not a fake Elon, but why?

      • knicholes a day ago

        It was very convincing. We thought it was a YouTube stream of the Starship launch. It paused with 40 seconds remaining, and "Musk" came on offering to reward those who support innovation and technology (BTC, in this case). All info here: https://docs.google.com/document/d/1lRbApgKT4U95zN0AYsPQqsLR...

        • yifanl a day ago

          My problem with your statement isn't if its believable Elon came on stage or not, my problem is why would you trust Elon to pay you your money back, whether its the authentic or imposter Musk.

        • Jordan-117 a day ago

          Kind of missing their point there. Giving Elon Musk $15k in crypto based on some vague too-good-to-be-true "trust me bro" pitch is embarrassing even if the video turned out to be real.

          • FergusArgyll 20 hours ago

            The guy lost 15k there's no need to rub it in!

      • Jensson a day ago

        Because it isn't worth real Elon's time to run these scams.

    • pil0u a day ago

      I got scammed similarly (although $10, because I tested first), because 1. it was on YouTube, on a channel called "SpaceX" with verified logo 2. with hundreds of thousands of viewers live 3. with a believable speech from Mr. Musk standing next to its rockets (and knowing his interest in cryptocurrencies).

      This happened as I was genuinely searching for the actual live stream of SpaceX.

      I am ashamed, even more so because I even posted the live stream link on Hacker News (!). Fortunately it was flagged early and I apologized personally to dang.

      This was a terrible experience for me, on many levels. I never thought I would fall in such a trap, being very aware of the tech, reading about similar stories etc.

      • dvh a day ago

        I am flabbergasted that you both get scammed. I would understand if this was two years ago, but now? Do people really not know about these scams? I can already see down votes coming for victim blaming, but this is to me really shocking. Notice that there isn't "tell hn: don't get scammed by deep fake crypto Elon" because people who usually posts also consider this general knowledge. That's why it's so effective I guess. In a similar manner there will never be "tell hn: don't drink acid it will burn your intestines", the danger is so obvious that nobody feels the need to post it and because nobody is posting it, people get scammed. I don't know what is the solution to that. How should you tell people what everybody should be already knowing?

        I remember being on a machining workshop and he was telling such an obvious things. Obvious things are obvious until they aren't, and then somebody gets hurt.

        • knicholes a day ago

          Yes, I've heard about these scams. I've made deepfakes myself in the past. I've openly mocked people who have fallen for these scams. But this was sophisticated. Perfectly timed, very convincing deepfake, popular YouTube channels showing this stream during the launch, as if it were legit. The website was branded as SpaceX (the domain was obviously not, but I wasn't vigilant in the exciting hullabaloo of the impending launch). The instructions to participate were clear and easy to use.

        • bn-l a day ago

          Hey it takes courage to admit to it. That’s admirable.

          • SXX a day ago

            This. Dont be ashamed. Everyone can get scammed.

            Reason people do is because we dont talk of risks often enough.

          • Kurtz79 13 hours ago

            Yes, thanks OP for sharing. I check HN front page mostly everyday and had no clue such sophisticated scams existed (I pretty much don’t use social media).

            It’s easy to think “eh, it will never happen to me” but hindsight is 20/20. I impulse-donated to things like Wikipedia in the past and I’m susceptible to FOMO as most people.

        • pil0u a day ago

          To be fair, if that was only $10 it's because it was more of a "let's see if that works". It was believable enough to try this out.

          The point of my message was to "tell hn: it could happen to people in this community".

      • knicholes a day ago

        Yes, this is the exact same scam.

        • betterhealth12 15 hours ago

          @knicholes & @pil0u - I am working on a system that would prevent this exact same scenario. I appreciate the docs write up, given that you were personally impacted by this and are passionate about it, I'd love to speak.

          I feel like the scale at which this is happening cross-internet must be staggering but because this is small-scale and un-reportable theft - who would the average person even go to, if they willingly sent the money, and they'd also have to get over the embarrassment of having fallen for it.

          What really got me thinking about the scale of this is watching the deepfake discussion at 1:51:46 in this video (at 1:52:00 he says his team spends 30% of their time sorting through deepfake ads, to the extent he had to hire someone whose exclusive job is to spot these scam videos and report them to FB etc):

          https://youtu.be/JMYQmGfTltY?si=ntuDgXuhMYj2fh5z&t=6706

    • fxtentacle a day ago

      Plot twist: It wasn't a deepfake.

      You sent your wallet to the real Elon and he used it as he saw fit. ;)

      • pjerem a day ago

        That’s what they said : they have been scammed !

    • kamranjon a day ago

      Would you consider writing a blog post about this experience? I'm incredibly interested in learning more details about how this unfolded.

      • qingcharles a day ago

        I think the comment is a joke. Their bio is satirical at least :)

        • nerevarthelame a day ago

          Their bio mentions their actual job and one project that is verifiably real. I think that the elements that seem satirical are real projects they're working on.

        • lucasmullens a day ago

          I'm pretty sure the comment wasn't a joke? I saw the stream last week, it was very impressive use of AI, I didn't realize it was AI until he started talking about doubling crypto.

          What about the bio is satirical? I'm pretty sure that's sincere too.

          • qingcharles a day ago

            User has edited their bio now :)

            • knicholes a day ago

              I didn't edit my bio. My projects are not satire. I'm just less ashamed than most, so I work on more "exciting" projects. I've worked extensively with generative AI, including video, myself. It was just that convincing to me in the moment. My regret knows no bounds. Luckily I earn enough this doesn't devastate me, but I really could have done some good with that money.

              • qingcharles a day ago

                Yikes. In that case, please accept my apology. Your bio disappeared for a while off your page, but it's back as it was now.

      • paul7986 a day ago

        Well just go on this guy's lawn and you will find your answer lol

    • Imustaskforhelp a day ago

      Please pardon me since I don't know if this is satirical or not. I'd wish if you could clarify it.

      Because if this is real, then the world is cooked

      if not, then the fact that I think that It might be real but the only reason I believe its a joke is because you are on hackernews so I think that either you are joking or the tech has gotten so convincing that even people on hackernews (which I hold to a fair standard) are getting scammed.

      I have a lot of questions if true and I am sorry for your loss if that's true and this isn't satire but I'd love it if you could tell me if its a satirical joke or not.

      • bauruine a day ago

        I guess it was something like [0] The Nigerian prince is now a deep fake Elon but the concept is the same. You need to send some money to get way more back.

        [0]: https://www.ncsc.admin.ch/ncsc/en/home/aktuell/im-fokus/2023...

        • Imustaskforhelp a day ago

          hm, but isn't it wild thinking that elon is talking to you and asking you for 15k , like bro has the money of his lifetime, why would he ask you?

          It doesn't make that much sense idk

          • atrus a day ago

            I remember watching the SpaceX channel on youtube, which isn't a legit source. AI Elon basically says "I want to help make bitcoin more popular, let me show you how easy it it to transfer money around with btc. Send my $X and I'll send you back $2X! It's very inline with a typical elon message (I'll give you 1 million to vote R), it's on a channel called SpaceX. It's pretty believable.

            Granted I played Runescape and EvE as a kid, so any double-isk scams are immediate redflags.

            • Imustaskforhelp a day ago

              Now I have never played runescape but have heard of this legendary game in references.

              For some reason, my mind confused runescape with neopets from the odd1sout video which I think is a good watch.

              Scams That Should be Illegal : https://www.youtube.com/watch?v=XyoBNHqah30

            • empath75 a day ago

              It's only believable to the extent that I believe that Musk would actually run such a transparently obvious scam.

          • Jensson a day ago

            Even Elon could lose his credit card or something, the story they spin is always something like that "I am rich but in a pickle, please send some money here and then I'll send you back 10x as much tomorrow when I get back to my account", but of course they never send it back.

            Edit: But of course Elon would call someone he knows rather than a stranger, rich people know a lot of people so of course they would never contact you about this.

      • knicholes a day ago

        Not satire. He made a big speech about rewarding those who invested early in tech to move humanity forward and the benefits of the blockchain. It was extremely convincing. Three college grads and a medical doctor were all convinced.

      • runarberg a day ago

        There are a lot of people on the internet, and every individual on the internet is in a unique situation. Chances are some of them are very likely to be persuaded by a scam which seems obvious to you.

        Parent’s story is very believable, even if parent made this particular story up (which I personally don‘t think is the case) this has probably happened to somebody.

        • Imustaskforhelp a day ago

          Ya maybe I didn't get their tone correctly which is why I was actually serious if they were joking or not.

          If they aren't joking, I apologize.

    • jaredklewis a day ago

      This comment is perfect.

      • latchkey a day ago

        As always, it is the replies that make it worth it. GopherGeyser strikes again!

        • knicholes a day ago

          You don't like the idea of GopherGeyser?

          • latchkey a day ago

            What are you talking about? I ordered 10 of them.

            • knicholes a day ago

              You couldn't have-- we sold out and are out of stock redesigning the board to be more usable during configuration and radio control.

              • latchkey 21 hours ago

                Oh shit, it is a real product? That's amazing.

    • michelb a day ago

      These SpaceX scams are rampant on youtube and highly, highly lucrative. It’s crazy and you have to be very vigilant, as whatever is promised lines up with Elon’s MO.

      • rangerelf a day ago

        Why would anyone give them any money AT ALL?

        It's not like they're poor or struggling.

        Am I missing something?

      • ablation 10 hours ago

        My mind is blown that people would engage in it at all, let alone need to be "vigilant". Amazing.

      • nickthegreek a day ago

        it requires zero vigilance if you dont play the game.

    • lionkor a day ago

      Not to victim-shame or anything, but that sounds more like more than one safety mechanism failed, the convincing tech only being a rather small part of it?

      • knicholes a day ago

        Yes, more than one safety mechanism failed. Coinbase actually flagged the transaction, but I was so desperate to get it to go through, I went through their facial validation process to expedite the transaction. If I hadn't for just a couple more minutes, I'd have realized it was a scam.

        • prawn 15 hours ago

          Scams usually have an element of urgency so you don't stop to think.

          Why did you and your graduate friends think an insanely rich man with a huge number of staff needed your financial help in testing transactions? This reminds me of those people that fall for celebrity love scams, where a rich celebrity needs their money - just baffling.

      • hansonkd a day ago

        I think the biggest failure is on the part of the companies hosting these streams.

        Its been a while, but I remember seeing streams for Elon offering to "double your bitcoin" and the reasoning was he wanted to increase the adoption and load test the network. Just send some bitcoin to some address and he will send it back double!

        But the thing was it was on youtube. Hosted on an imposter Tesla page. The stream had been going on for hours and had over ten thousand people watching live. If you searched "Elon Musk Bitcoin" During the stream on Google, Google actually pushed that video as the first result.

        Say what you want about the victims of the scam, but I think it should be pretty easy for youtube or other streaming companies to have a simple rule to simply filter all live streams with Elon Musk + (Crypto|BTC|etc) in the title and be able to filter all youtube pages with "Tesla" "SpaceX" etc in the title.

        • betterhealth12 15 hours ago

          What are your thoughts on this being solved by the negative of the situation? So, instead of having to vet every single stream, tweet etc to check if it's legit, basically the idea is that you shouldn't "trust" what you are seeing unless it's explicitly endorsed via a signature from the original creator.

          Obviously, if it's coming from their official channels the "signature" can be more obvious, but a layer that facilitates this could do a lot of good imo.

        • lionkor a day ago

          I feel like somehow that would lessen it, but not really help much? There are obviously people with too much money in BTC who are trying to take any gamble to increase its value. It sounds like a deeper societal issue.

          • jfoster a day ago

            You are right that they might never be able to get it to 0, but shouldn't they lessen it if a simple measure like the one described can prevent a bunch of people from getting fooled by the scam?

    • AbraKdabra a day ago

      I don't mean to be rude, but this sounds like natural selection doing its work.

      • umbra07 a day ago

        That's the sort of statement that remains extremely rude even if you try and prefix it with "I don't mean to be rude".

        • AbraKdabra a day ago

          It's not rude if it's the truth.

          Also he's a troll so...

      • knicholes a day ago

        I'm pretty successful with an above average IQ. It was very convincing, along with three other college grads (one a medical doctor).

    • amatajohn a day ago

      the modern turing test:

      am i getting scammed by a billionare or an AI billionaire?

    • UltraSane a day ago

      On the balance of probabilities it being a scam is vastly more likely than Elon actually wanting to contact you. Why would Elon need $15k in bitcoin?

      It seems like money naturally flows from the gullible to the Machiavellian.

    • pennaMan a day ago

      hey, I got a bridge to sell you, was $20k but we can lower it to $15k if you pay in BTC

      • testplzignore a day ago

        You're paying too much for your bridges man. Who's your bridge guy?

        • dkiebd a day ago

          That wasn’t a bridge.

      • prawn 14 hours ago

        He would sell tunnels, not bridges!

      • 77pt77 a day ago

        Was the bridge built by a genius like Elon though?

    • DonHopkins a day ago

      [flagged]

      • lucasmullens a day ago

        Come on, don't be mean. Imagine saying this in person to someone who just told you they got scammed. "You're just extremely gullible" is just so mean...show some empathy.

        • DonHopkins 4 hours ago

          Anyone who trusts Musk enough to send him $15k doesn't deserve a bit of empathy.

lifthrasiir a day ago

FYI, this is the famed nano-banana model which has been now renamed to gemini-2.5-flash-image-preview in LMArena.

  • Mistletoe a day ago

    https://medium.com/data-science-in-your-pocket/what-is-googl...

    For people like me that don’t know what nano-banana is.

    • mock-possum a day ago

      Wow I hate the ‘voice’ in that article - big if true though.

      • daemonologist a day ago

        I suspect the "voice" is a language model with a bad system prompt. (Possibly the author's own words run through an LLM, to be charitable.)

        • 3036e4 a day ago

          It's medium.com. YouTube comments quality text packaged as clickbait articles for some revenue share. It was always slop, even without LLMs. Do they even bother with paying human authors now or is the entire site just generated? That would probably be cheaper and improve quality.

          • debugnik a day ago

            > Do they even bother with paying human authors now

            I thought Medium was a stuck up blogging platform. Other than for paid subscriptions, why would they pay bloggers? Are they trying to become the next HuffPost or something?

  • seydor a day ago

    I mean they are going to have to rename their AI because gemini.com is going to IPO soon.

    "Banana" would be a nice name for their AI, and they could freely claim it's bananas.

    • PunchTornado 10 hours ago

      why do you think they have to rename it because some company's IPO

  • postscapes1 a day ago

    This is what i came here to find out. Thanks.

mkl a day ago

That lamp example is pretty impressive (though it's hard to know how cherry-picked it is). The lamp is plugged in, it's lighting the things in the scene, it's casting shadows.

boyka 3 hours ago

> When we first launched native image generation in Gemini 2.0 Flash earlier this year, you told us you loved its low latency, cost-effectiveness, and ease of use.

I wasn't aware there is a channel where Google asks for feedback or where you are able to tell them that you love it. I only see a "report a problem button". Which channel is it?

  • theptip 3 hours ago

    Let me tell you about this app called “X”, formerly known as “Twitter”. People use it to post their opinions about products, among other things.

greatgib 20 hours ago

   All images created or edited with Gemini 2.5 Flash Image will include an invisible SynthID digital watermark, so they can be identified as AI-generated or edited.
Obviously I understand what is the purpose and the good intention, but I think sad to see that we are not not anymore responsible adults but big corps deciding for us what we can and what we cannot do. Snitching on your back.
  • brokencode 20 hours ago

    At what point in time were we ever responsible adults with technology?

    Deepfakes have the potential to totally destroy the average person’s already tenuous connection with reality outside of their immediate vicinity.

    Some will be fooled by an endless stream of fakes and others will never believe anything again.

    Politicians will dismiss footage of them doing or saying something bad as fake. And many times, they’ll be telling the truth.

    We are already living in a post-fact world to some extent, but buckle up, because it’s about to get a lot worse.

  • jedimastert 18 hours ago

    I'm generally against "if you have not thing to fear you have nothing to hide" arguments but I'm curious what your argument is here for why it would be a problem that AI generated and edited images can be recognized as such.

    Edit: I should probably say for full transparency that I am strongly FOR watermarks for AI imagery

    • greatgib 11 hours ago

      My problem is more the general idea that nowadays the tech is hostile to the user. Before when you paid for something, it was fully yours to use in a good or in a bad way.

      Imagine for example, that in the future and with improved tech, manufacturer of knifes were to embed a gps chip in all knifes sold because it might be used for dangerous things.

      The worse in all of that being that the big tech does it based on their own "moral" compass and not based on a legal requirement.

      Regarding the watermark, that is also applying to generated text in theory, imagine that you ask ai to refactor a job application letter or a letter to your landlord, and that Google will snitch you with watermark that you used AI for that.

  • LZ_Khan 20 hours ago

    I don't see the problem as it's not like you're forced to use their image generation model.

  • jedimastert 18 hours ago

    Also, it's not really your image. Like if an artist puts a watermark on a commissioned piece it's not really a good argument that the artist is "snitching" by saying the art was done by them and not you trying to pass it off as your own...

    I don't know if that's the argument you're trying to make, but I think it's worth considering

  • cl0ckt0wer 20 hours ago

    dont worry, you can just screenshot the image to get rid of the watermark

    • fwip 17 hours ago

      That's not correct. The watermark is robust to screenshots, file format changes (saving as jpeg/png) and at least light transformation (cropping, saturation level adjustment, etc).

  • Quiark 14 hours ago

    Me and you are but that guy over there isn't

abdusco a day ago

I love that it's substantially faster than ChatGPT's image generation. It takes ages, so slow that the app tells you to not wait and sends you notification when the generation finishes.

  • andrewinardeer a day ago

    "Generate an image of OpenAI investors after using Gemini 2.5 Flash Image"

orloffm 8 hours ago

So it doesn't allow to do anything with photos containing kids, right? Isn't it too much of a filter for such a thing? ChatGPT thankfully created Ghibli versions of everything I gave it.

  • dsrtslnd23 5 hours ago

    yes - any children images seem to be banned. Can't create a children's book with reference image.

mortsnort a day ago

At $0.02 per image, it's prohibitively expensive for many use-cases. For comparison, the cheapest Flux model (Schnell) is $0.003 per image.

  • zeknife a day ago

    How many images do you need? What are the use-cases that need a bunch of artificial yet photoreal images produced or altered without human supervision?

    • skybrian a day ago

      I think people still expect a lot of trial and error before getting a usable image. At 2 cents per pull of the slot machine lever, it would still take a while, though.

  • bn-l a day ago

    Schnell isn’t AR and doesn’t do editing.

    • mortsnort a day ago

      Fair but the Gemini "flash" branding implies it's their model for speed/scale in my mind.

  • sergiotapia a day ago

    yes, too expensive for my use case.

        Service            Cost per Image   Cost per 1,000 Images
        Flux Schnell       $0.003           $3.00
        Gemini 1.5 Flash   $0.039           $39.00
starchild3001 15 hours ago

This feels like a real inflection point for image editing models. What stood out to me isn’t just the raw generative quality, but the consistency across edits and the ability to blend multiple references without falling apart. That’s something people have been hacking around with pipelines (Midjourney → Photoshop → Inpainting tool), but seeing it consolidated in one model/API makes workflows dramatically simpler.

That said, I think we’re still in the “GPT-3.5” phase of image editing: amazing compared to what came before, but still tripping over repeated patterns (keyboards, clocks, Go boards, hands) and sometimes refusing edits due to safety policies. The gap between hype demos and reproducible results is also very real; I’ve seen outputs veer from flawless to poor with just a tiny prompt tweak.

qoez a day ago

Anyone know how it handles '1920s nazi officer'? They stopped doing humans for a while but now I see they're back so I wonder how they're handling the criticism they got from that

  • napo a day ago

    it said: "I can create images about lots of things but not that. Can I try a different one for you?"

    • napo a day ago

      when giving more context it replied:

      """ Unfortunately, I can't generate images of people. My purpose is to be helpful and harmless, and creating realistic images of humans can be misused in ways that are harmful. This is a safety policy that helps prevent the generation of deepfakes, non-consensual imagery, and other problematic content.

      If you'd like to try a different image prompt, I can help you create images of a wide range of other subjects, such as animals, landscapes, objects, or abstract concepts. """

      • bastawhiz a day ago

        What a weird rejection. You have to scroll pretty far in the article to see an example output that doesn't have a realistic depiction of a person.

        • geysersam a day ago

          It's unfortunate they can't just explain the real reason they don't want to generate the image:

          "Unfortunately I'm not able to generate images that might cause bad PR for Alphabet(tm) or subsidiaries. Is there anything else I can generate for you?"

        • tanaros a day ago

          The rejection message doesn’t seem to be accurate. I tried “happy person” as a prompt in AI Studio and it generated a happy human without any complaints.

          It’s possible that they relaxed the safety filtering to allow humans but forgot to update the error message.

  • Der_Einzige a day ago

    The moment the weights are on huggingface someone with orthogonalize/abliterate the model and make it uncensored.

    • rvnx a day ago

      BigBanana would be a good name for that future OnlyFans model

ahmeni 14 hours ago

Really like how they were so excited to release this that they managed to break existing SKUs and cause a bunch of GCP customers to get billed at 100x rates for text tokens as flash image generation tokens.

  • UltraSane 14 hours ago

    Ouch. that is not going to be fun to clean up.

radarsat1 a day ago

I've had a task in mind for a while now that I've wanted to do with this latest crop of very capable instruction-following image editors.

Without going into detail, basically the task boils down to, "generate exactly image 1, but replace object A with the object depicted in image 2."

Where image 2 is some front-facing generic version, ideally I want the model to place this object perfectly in the scene, replacing the existing object, that I have identified ideally exactly by being able to specify its position, but otherwise by just being able to describe very well what to do.

For models that can't accept multiple images, I've tried a variation where I put a blue box around the object that I want to replace, and paste the object that I want it to put there at the bottom of the image on its own.

I've tried some older models, and ChatGPT, also qwen-image last week, and just now, this one. They all fail at it. To be fair, this model got pretty damn close, it replaced the wrong object in the scene, but it was close to the right position, and the object was perfectly oriented and lit. But it was wrong. (Using the bounding box method.. it should have been able to identify exactly what I wanted to do. Instead it removed the bounding box and replaced a different object in a different but close-by position.)

Are there any models that have been specifically trained to be able to infill or replace specific locations in an image with reference to an example image? Or is this just like a really esoteric task?

So far all the in-filling models I've found are only based on text inputs.

  • rushingcreek a day ago

    Yes! There is a model called ACE++ from Alibaba that is specifically trained to replace masked areas with a reference image. We use it in https://phind.design. It does seem like a very esoteric and uncommon task though.

    • ceroxylon a day ago

      I don't think it is that esoteric, that sounds like deepfake 101. If you don't mind answering, does Phind do anything to prevent / mitigate this?

    • radarsat1 a day ago

      Oh cool thanks I haven't come across that one, I'll give it a shot.

  • Valk3_ a day ago

    Not sure what your exact task is, but I have a similar goal as well. Haven't had time to try alot of different models or ideas yet because got busy with other stuff, but have you tried this: https://youtu.be/dQ-4LASopoM?si=e33FQd5f4fYr4J5L&t=299

    where you stitch two images together, one is the working image (the one you want to modify), and the other one is the reference image, you then instruct the model what to do. I'm guessing this approach is as brittle as the other attempts you've tried so far, but I thought this seemed like an interesting approach.

kemyd a day ago

I don't get the hype. Tested it with the same prompts I used with Midjourney, and the results are worse than in Midjourney a year ago. What am I missing?

  • bonoboTP a day ago

    The hype is about image editing, not pure text-to-image. Upload an input image, say what you want changed, get the output. That's the idea. Much better preservation of characters and objects.

    • appenz a day ago

      I tested it against Flux Pro Kontext (also image editing) and while it's a very different style and approach I overall like Flux better. More focus on image consistency, adjusts the lighting correctly, fixes contradictions in the image.

      • qingcharles a day ago

        I've been testing it against Flux Pro Kontext for several weeks. I would say it beats Flux in a majority of tests, but Flux still surprises from time-to-time. Banana definitely isn't the best 100% of the time -- it falls a bit short of that. Evolution, not revolution.

        • vunderba a day ago

          Agreed. I find myself alternating between Qwen Image Edit 20B, Kontext, and now Flash 2.5 depending on the situation and style. And of course, Flash isn't open-weights, so if you need more control / less censorship then you're SOL.

          • frank_nitti a day ago

            Has there been a sufficient indication to conclude these weights will not (now or ever) be released?

            • vunderba a day ago

              Are any of Google's generative models besides Alphafold open weight? (Veo, Imagen, etc.)

              I don't think we can really answer the question if Flash will ever be released.

          • Melchizedek 13 hours ago

            It’s good but holy shit is it censored! Try generating any kind of scene on a beach…

    • SirMaster a day ago

      Can it edit the photo at the original resolution?

      Most of my photos these days are 48MP and I don't want to lose a ton of resolution just to edit them.

      • vunderba a day ago

        Great question. I really doubt it would be able to support any resolution. I'm sure that behind the scenes it scales it down to somewhere around 1 mp before processing even if they decide to upscale and return it back at the original resolution.

        • SirMaster a day ago

          So then this doesn't really replace traditional photoshop editing of my photos I guess.

      • qingcharles a day ago

        I don't know. All the testing I've done has output the standard 1024x1024 that all these models are set to output. You might be able to alter the output params on the API or AI Studio.

    • kemyd a day ago

      Thanks for clarifying this. That makes a lot more sense.

  • vunderba a day ago

    Midjourney hasn't been SOTA for over a year. Even the latest release of version 7 scores extremely low on prompt adherence only managing to get 2 out of 12 prompts correct. Even Flux Dev running locally consistently out performs it.

    Here's a comparison of Flux Dev, MJ, Imagen, and Flash 2.5.

    https://genai-showdown.specr.net/?models=FLUX_1D%2CMIDJOURNE...

    That being said, if image fidelity is absolutely paramount and/or your prompts are relatively simple - Midjourney can still be fun to experiment with particularly if you crank up the weirdness / chaos parameters.

  • cdrini a day ago

    Hmm, I think the hype is mainly for image editing, not generating. Although note I haven't used it! How are you testing it?

    • kemyd a day ago

      I tested it with two prompts:

      // In this one, Gemini doesn't understand what "cinematic" is

      "A cinematic underwater shot of a turtle gracefully swimming in crystal-clear water [...]"

      // In this one, the reflection in the water in the background has different buildings

      "A modern city where raindrops fall upward into the clouds instead of down, pedestrians calmly walking [...]"

      Midjourney created both perfectly.

      • echelon a day ago

        As others have said, this is an image editing model.

        Editing models do not excel at aesthetic, but they can take your Midjourney image, adjust the composition, and make it perfect.

        These types of models are the Adobe killer.

        • kemyd a day ago

          Noted that! The editing capabilities are impressive. I was excited for image gen because of the API (Midjourney doesn't have it yet).

          • echelon a day ago

            David Holz mentioned on Twitter that he was considering a Midjourney API. They're obviously providing it to Meta now, so it might become more broadly available after Midjourney becomes the default image gen for Meta products.

            Midjourney wins on aesthetic for sure. Nothing else comes close. Midjourney images are just beautiful to behold.

            David's ambition is to beat Google to building a world model you can play games in. He views the image and video business as a temporary intermediate to that end game.

    • qingcharles a day ago

      It actually has impressive image generating ability, IMO. I think the two things go hand-in-hand. Its prompt adherence can be weaker than other models, though.

  • ihsw a day ago

    [dead]

j_m_b a day ago

If this can do character consistency, that's huge. Just make it do the same for video...

  • ACCount37 a day ago

    It's probably built on reused "secret sauce" from the video generation models.

esamust a day ago

Strange. I was excited to play around with the 2.5 flash image after testing the nano banana in LMarena, but the results are not at all the same? So I went back to LMarena to replicate my earlier tests but it's way worse than when it was nano banana? Did I miss something?

johnfn a day ago

I naively went onto Gemini in order to try to use the new model and had what I could only describe as the worst conversation I've had with an AI since GPT 3.5[1]. Is this really the model that's on top of the leaderboard right now? This feels about 500 ELO points worse than my typical conversation with GPT 5.

Edit: OK, OK, I actually got it to work, and yes, I admit the results are incredible[2]. I honestly have no idea what happened with Pro 2.5 the first time.

[1]: https://g.co/gemini/share/5767894ee3bc [2]: https://g.co/gemini/share/a48c00eb6089

  • GaggiX a day ago

    "Google AI Studio" and select the model

  • SpaceManNabs a day ago

    sometimes these bots just go awry. i wish you could checkpoint spots in a conversation so you could replay from a that point, maybe with a push in the latent space or a new seed.

beyonddream a day ago

“Internal server error

Sorry, there seems to be an error. Please try again soon.”

Never thought I would ever see this on a google owned websites!

  • lionkor a day ago

    A cheap quip would be "it's vibe-coded", but that might actually very well be the case at this point!

  • reaperducer a day ago

    Never thought I would ever see this on a google owned websites!

    Really? Google used to be famous not only for its errors, but for its creative error pages. I used to have a google.com bookmark that would send an animated 418.

anotheryou a day ago

Super cheap generation but expensive image upload, do I read that right?

https://openrouter.ai/google/gemini-2.5-flash-image-preview

  • daviding a day ago

    Not sure. If the Flash image output is $30/M [1] then that's pretty similar to gpt-image-1 costs. So a faster and better model perhaps but not really cheaper?

    [1] https://developers.googleblog.com/en/introducing-gemini-2-5-...

    • daviding a day ago

      Since I can't edit, it seems like Flash image is about 23% (4 cents vs 17 cents) of the cost of Openai gpt-image-1, if you're putting an image and prompt in and getting out, say, a 1024x1024 generated image. With the quicker production time that makes it interesting. Expecting Openai to respond at least in terms of pricing, e.g. a flat rate output cap price or something to be comparable.

  • dangoodmanUT a day ago

    That’s like .12 cents per image uploaded

asadm a day ago

this is amazing. I just wish models would have more non-textual controls. I don't want to TYPE my instructions. We need a better UI for editing images with AI.

  • ojr 16 hours ago

    You want to use a mouse and keyboard and learn 20 buttons like it's 1990?

    • yborg 13 hours ago

      I don't want to converse with a 4 year old with the world's photo libraries at its disposal. I spent 10 minutes trying to convince the model to add a watch to a person's left arm instead of the right arm and it would not do it, it apparently could not get the idea on this particular image. If I had a drawing tool I could circle where I wanted it and say 'THERE, stupid'. Next year when we have AGI all of this will be moot of course, but for now Photoshop isn't going away.

  • mh- a day ago

    Can you expand on that? What would ideal look like to you?

vector3 10 hours ago

Is this truly "native" to Gemini 2.5 Flash as they call it? Isn't this a dedicated and different text-to-image model hooked up to Gemini 2.5 Flash with function calling? or do they somehow merge the weights of the two models whilst also not incurring in side effects like degradation in performance?

therealmarv a day ago

What is the max input and output resolution of images?

This is why I'm sticking mostly to Adobe Photoshop's AI editing because there are no restrictions in that regard.

  • qingcharles a day ago

    In my testing it has been stuck at 1024x1024. Have to upscale with something...

  • abdusco a day ago

    Around 1 megapixel, AFAICT.

shashankpritam a day ago

Are men not attractive? Or perhaps for Google, this blog is a targeted content? But who is it targeting? I would like to see the reasoning behind using all women images (at the least the top/first ones) to show off the model capabilities. I have noticed this trend in the image manipulation business a lot.

  • rd a day ago

    The average man finds the average woman more attractive than the average woman finds the average man. Replace attractive with (eye-catching/attention-grabbing/motivating/retention-boosting).

    • shashankpritam a day ago

      Oh, in that case, it makes sense. Also, I think men/women consume different kind of media and this is one of those "men dominated" corner of the internet. I also think due to trainig data bias - there could be some difference in quality with different subjects. So, they might be showing off their best of best.

  • zoeysmithe a day ago

    Because tech is largely male dominated and has inherent sexism/patriarchy and images of women, especially conventionally attractive ones, has the perception of aiding sales.

    Also women are seen as more cooperative and submissive, hence so many home assistants and AI being women's voices/femme coded.

    • shashankpritam a day ago

      Thank you for saying that. When I posted that GP comment - it got immediately downvoted and I couldn't even see my comment on the thread. I kind of expected to get it tagged 'meta/off-topic' and removed.

      The way I see it - corporations would like to exploit prejudices for revenue. Of course, this is not something new. But it is a societal issue and the corporate world is playing a large role in it.

      For context this was the original link - https://deepmind.google/models/gemini/image/

modeless a day ago

This model is very impressive. Yesterday (as nano-banana) I gave it a photo of an indoor scene with a picture hanging on a wall, and asked it the picture on a wall with a copy of the whole photo. It worked perfectly the first time.

It didn't succeed in doing the same recursively, but it's still clearly a huge advance in image models.

elorant a day ago

I have a certain use case for such image generators. Feed them an entire news article I fetch from bbc and ask it to create an image to accompany the article. Thus far only midjourney managed to understand context. And now this, which is even more impressive. We live in interesting times.

  • vunderba a day ago

    I think most of the SOTA models could probably handle this but you'd probably get better results using a pipeline:

    1. Reduce article to a synopsis using an LLM

    2. Generate 4-5 varying description prompts from the synopsis

    3. Feed the prompts to an imagegen model

    Though I'd wager that gpt-image-1 (in the ChatGPT) being multimodal could probably managed it as well.

  • oracleclyde a day ago

    I just tried it inside Gemini with a Medium article. Here's my prompt: "Read the article at this url and provide a hero image that incapsulates the message the author wants to convey: https://bioneers.org/supreme-oligarchy-billionaires-supreme-..."

    The response was a summary of the article that was pretty good, along with an image that dagnabbit, read the assignment.

simedw a day ago

The model is only available in AI Studio when I set my VPN to the USA (I’m located in the UK).

  • ctippett a day ago

    Also in the UK and I could access the model just fine from AI studio, no VPN required.

stuckinhell a day ago

Is this the "nano banana" thing the art ai world was going crazy about recently ?

jawns a day ago

I was able to upload my kids' back-to-school photos and ask nano-banana to turn them into a goth, an '80s workout girl, and a tracksuit mafioso. The results were incredibly believable, and I was able to prank my mom with them!

mNovak a day ago

These models still seem to struggle with getting repeated patterns right. Others have mentioned piano keys; I've noticed they also almost always fail to generate a valid Go board.

simianwords a day ago

L like it but it is very restricted. I can't modify people's faces etc.

mindprince a day ago

What is the difference between Gemini Flash Image models and the Imagen models?

  • og_kalu a day ago

    Imagen is a diffusion text to image model. You write some text that describes your image, you get an image out and that's it.

    Flash Image is an image (and text) predicting large language model. In a similar fashion to how trained LLMs can manipulate/morph text, this can do that for images as well. Things like style transfer, character consistency etc.

    You can communicate with it in a way you can't for imagen, and it has a better overall world understanding.

  • raincole a day ago

    Imagen: Stable Diffusion, but by Google

    Gemini Flash Image: ChatGPT image, but by Google

patates a day ago

It seems that they still block access from Europe, or from Germany at least.

  • beklein a day ago

    It works fine in OpenRouter

  • elorant a day ago

    I can access it from Greece through AI Studio just fine.

  • punkpeye a day ago

    Use one of the router services

  • Narciss a day ago

    Use it on fal.ai

    • kumarm a day ago

      Since API currently is not working (seems rate limits not set for Image Generation yet) I tried on fal.

      Definitely inferior to results I see on AI Studio and image generation time is 6s on AI Studio vs 30 seconds on Fal.AI

      • echelon a day ago

        > Definitely inferior to results

        Quality or latency?

        • kumarm 16 hours ago

          Latency. Started using API. Gets response directly on Gemini API in under 6 seconds. Fal.ai takes about 30 seconds.

  • kridsdale1 a day ago

    Get less contradictory regulations, then.

    • kneegerm a day ago

      [flagged]

      • patates a day ago

        It's not you vs us. Same spec of dust in the universe, y'know.

      • rvnx a day ago

        In EU they forbid us newspapers from non-approved countries, impose cookies banners everywhere, and now block porn. Soon they will forbid some AI models which have not passed EU censorship ("safety") validation. Because we all know that governments (or even Google with Android) are better at knowing what is the safest for you.

        https://digital-strategy.ec.europa.eu/en/news/eu-rules-gener...

        • krige a day ago

          How do you do, fellow europeans?

kumarm a day ago

Seems to be failing at API Calls right now with "You exceeded your current quota, please check your plan and billing details. For more information on this error,"

Hope they get API issues resolved soon.

jjangkke 20 hours ago

im sure that lot of users are doing this right now and i wonder what the implication of this is, anybody with a photo of you (even just a face) can now generate photos of you in anyway they desire.

its only a matter of time before this can be used to generate coherence with nudity on consumer hardware.

tpoacher 11 hours ago

They should have called it emacs-banana, just to piss more people off.

And then call the next model vim-banana and start the editor-banana wars.

In all seriousness though, I'm seeing a worrying trend where google is hijacking well-known project names from other domains more and more now, with the "accidental"(?) side-effect of diluting their searchability and discoverability online, to the point I can no longer believe it is mere coincidence (whether malicious or not is another story altogether of course, but even if not, this is still a worrying trend).

I mean, what's next? Gopher 2.5 GIMP Video aka sublime-apple?

chadcmulligan a day ago

This is technically impressive though I really wish they'd choose other professions to automate than graphic design.

  • dangoodmanUT a day ago

    It’s what data is available, they’re not targeting graphic design

  • reaperducer a day ago

    AI is supposed to set us all free. Yet, so far all the tech companies have done is eliminate the jobs of the lowest-paid people (artists, writers, photographers, designers) and transfer that money to billionaires. Yay.

    • anthonypasq a day ago

      [Plows] are supposed to set us all free. Yet, so far all the tech companies have done is eliminate the jobs of the lowest-paid people ([field hands]) and transfer that money to landowners. Yay.

      • reaperducer a day ago

        If you can't understand the difference, perhaps consult one of your AI chat overlords.

        • throitallaway a day ago

          History repeats itself. Productivity gains the last ~half century have mostly made their way to the top.

TrousersHoisted a day ago

What is the "flash image?" I don't see anything downloadable there...

bsenftner a day ago

All these image models are time vampires and need to be looked at with very suspicious eyes. Try to make a room - that's easy, now try to make multiple views of the same room - next to impossible. If one is intending to use these image models for anything that requires consistency of imagery, forget it.

t_mahmood a day ago

After the rugpull of Android, are we really going to trust Google with anything?

mclau157 a day ago

I could see this destroying a lot of jobs like photography, editing, marketing, etc.

  • bityard a day ago

    These jobs won't go away. Power tools didn't destroy carpentry. Computers didn't destroy math. But workers who don't embrace these new tools will probably get left behind by those who do.

FergusArgyll 20 hours ago

Google really needed something viral, this may be it...

darrinm a day ago

Has anyone tested how generation speed compares to gpt-image-1?

  • cornedor a day ago

    It's consistently around 10 seconds, often faster.

sandreas a day ago

I wonder if this could be used for preprocessing documents before doing OCR...

oulipo2 a day ago

I tried it, it gave a poor quality image that wasn't even what I asked for. I then asked for a correction, and it gave me another faulty image. Doesn't seem to be there

keepamovin a day ago

Those examples are gorgeous and amazing. This is really cool.

Narciss a day ago

Nano banana is here!

cchance a day ago

did they actually roll it out i cant seem to find the option to use it

Edit: Nevermind its not in gemini for everyone yet, its in aistudio though

captainregex a day ago

anyone else get excited about nano and then sad when you realized it’s not actually a small model

yuchana a day ago

The progress is insanely good but imagine the competition between engineers especially there are many people taking up courses in ai and cs

lyu07282 a day ago

still fails at analog clocks, if anyone else was also wondering

bawana a day ago

Google is eating adobe

sam1234apter a day ago

this model is awesome - now anyone can build photo ai apps

asdev a day ago

Looks like AI image generation is converging to a local maximum as well

ragazzina a day ago

"Can you make a version of this picture where I wear the best possible sunglasses for my face shape?"

made me realize that AI image modification is now technically flawless, utterly devoid of taste, and that I myself am a rather unattractive fellow.

artursapek 21 hours ago

There are so many uses for this it boggles my mind. Ecommerce, real estate, advertising...

BoorishBears a day ago

I'm really waiting for a Pro sized Gemini model with image output.

I experimented heavily with 2.0 for a site I work on, but it never left preview and it had some gaps that were clearly due to being a small model (like lacking world knowledge, struggling with repetition, missing nuance in instructions, etc.)

2.5 Flash/nano-banana is a major step up but still has small model gaps peeking through. It still gets to "locked in" states where it's just repeating itself, which is a similar failure mode of small models for creative writing tasks.

A 2.5 Pro would likely close those gaps and definitively beat gpt-image-1

runarberg a day ago

Still fails the “full glass of wine” test, and still shows many of the artifacts typical of AI generated images like non-nonsensical text, misplacement of objects, etc.

To be honest I am kind of glad. As AI generated images proliferate, I am hoping it will be easier for humans to call them out as AI.

uejfiweun a day ago

This is pretty remarkable, I'm having a lot of fun playing around with this. Kudos to Google.

GaggiX a day ago

An image seems to be 256 tokens looking the AIstudio tab, so you can generate 3906,25 images per 1M tokens, that seems a lot if I'm not wrong in some ways.

Edit: the blog post is now loading and reports "1290 output tokens per image" even though on the AI studio it said something different.

dboreham a day ago

Hmm...assumed this was a model shipped on a flash drive...

idiotsecant a day ago

This is going to be so helpful for all the poorly photoshopped Chinese junk eBay listings.

awestroke a day ago

Internal server error. lol

casey2 a day ago

4 out of 7 images show a woman 1 out of 7 show a man I feel like this is trying to advertise power over women to men. Which makes it evil.

jari_mustonen 14 hours ago

Is it capable of generating white males?

Given that this has been a serious problem with Google models, I would guess it would have been a good thing to at least add one such example in the marketing material. If the marketing material is to be believed, the model really prefers black females.

  • BoorishBears 13 hours ago

    > Given that this has been a serious problem with Google models

    It hasn't been though, has it?

    At one point one of their earliest image gen models had a prompting problem: they tried to have the LLM doing prompt expansion avoid always generating white people, since they realized white people were significantly overrepresented in their training data compared to their proportion of the population.

    Unfortunately that prompt expansion would sometimes clash with cases where there was a specific race required for historical accuracy.

    AFAIK they fixed that ages ago and it stopped being an issue.