ndr 2 days ago

The image generation results are extremely poor, but it's exciting that it does _anything_ in the browser.

  • vunderba 2 days ago

    Even the full 7b model's results are relatively low-res (384x384) so its hard for me to imagine the generative aspect of the 1b model would be useable.

    Comparisons with other SoTA (Flux, Imagen, etc):

    https://imgur.com/a/janus-flux-imagen3-dall-e-3-comparisons-...

    • nicce a day ago

      I am not sure if the results are that comparable to be honest. For example DALL-E expands the prompt by default to be much more descriptive. We would need to somehow point out that it is close to impossible to produce the same results than DALL-E, for example.

      I bet there has been a lot of testing that what looks "by default" much more attractive for the general people. It is also a selling point, when low effort produces something visually amazing.

    • littlestymaar a day ago

      It's still very impressive that it gets the cube order right!

      Also it looks like octopuses are suffering the “six finger hand” syndrome with their arms from all models.

  • qingcharles a day ago

    I actually had some pretty impressive results (and a few duds). I think we've lost sight of how amazing something like this actually is. I can run this on my low-end GPU in a web browser and it doesn't even tax it, yet it's creating incredible images out of thin air based on a text description I wrote.

    Just three years ago this would have been world-changing.

  • jjice 2 days ago

    I don't know a lot about image generation models, but 1B sounds super low for this kind of model, so I'm pretty impressed, personally.

    • diggan 2 days ago

      If I remember correctly, SD had less than 1B parameters at launch (~2 years ago?), and you could generate pretty impressive images with the right settings and prompts.

thefirstname322 18 hours ago

Hi HN! We’re excited to launch JanusPro-AI, an open-source multimodal model from DeepSeek that unifies text-to-image generation, image understanding, and cross-modal reasoning in a single architecture. Unlike proprietary models, JanusPro is MIT-licensed and optimized for cost-efficiency—our 7B-parameter variant was trained for ~$120k, outperforming DALL-E 3 and Stable Diffusion XL in benchmarks like GenEval (0.80 vs. 0.67) 25.

Why JanusPro? Decoupled Visual Encoding: Separates image generation/understanding pathways, eliminating role conflicts in visual processing while maintaining a unified backbone 2.

Hardware Agnostic: Runs efficiently on consumer GPUs (even AMD cards), with users reporting 30% faster inference vs. NVIDIA equivalents 2.

Ethical Safeguards: Open-source license restricts military/illegal use, aligning with responsible AI development

please checkout the website: https://januspro-ai.com/

pentagrama 2 days ago

Happy to have these models running locally on a browser. However, the results are still quite poor for me. For example: https://imgur.com/a/Dn3lxsU

  • sdesol a day ago

    It's not too bad given that it runs in your browser. I took your prompt and asked GPT-4o mini to elaborate on it and got this https://imgur.com/a/qmQ7ZHl

    The burger looks good.

n-gauge a day ago

I like the local running of this and learning about how it works.

Q:These models running in WebGPU all seem to need nodejs installed. I that for just the local 'server side', can you not just use a python http server or tomcat for this and wget files?

  • andrewmackrodt a day ago

    Had a peek at the repo and it looks to be a react frontend, so a JavaScript runtime is needed to "bundle" the application in a way browsers can consume. If you had the dist folder then I imagine you can use whatever web server you want to serve the static files.

jedbrooke 2 days ago

well it was a long shot anyway but it doesn’t seem to work on mobile. (tried on iOS safari on iPhone 11 pro)

a 1B model should be able to run in the RAM constraints of a phone(?) if this is supported soon this would actually be wild. Local LLMs in the palm of your hands

  • nromiun 2 days ago

    I don't know about this model but people have been running local models in Android phones for years now. You just need a large amount of ram (8-12 GB), ggml and Termux. I tried it once with a tiny model and it worked really well.

    • kittikitti a day ago

      This is from Reddit, what were you expecting?

  • bla3 2 days ago

    This needed a 4 GB renderer process and about that much additional memory use in the GPU process for me, in Chrome.