Janus Pro 1B running 100% locally in-browser on WebGPU

old.reddit.com

212 points by fofoz 2 days ago

ndr 2 days ago

The image generation results are extremely poor, but it's exciting that it does _anything_ in the browser.

vunderba 2 days ago

Even the full 7b model's results are relatively low-res (384x384) so its hard for me to imagine the generative aspect of the 1b model would be useable.
Comparisons with other SoTA (Flux, Imagen, etc):
https://imgur.com/a/janus-flux-imagen3-dall-e-3-comparisons-...
- nicce a day ago
  
  I am not sure if the results are that comparable to be honest. For example DALL-E expands the prompt by default to be much more descriptive. We would need to somehow point out that it is close to impossible to produce the same results than DALL-E, for example.
  I bet there has been a lot of testing that what looks "by default" much more attractive for the general people. It is also a selling point, when low effort produces something visually amazing.
- littlestymaar a day ago
  
  It's still very impressive that it gets the cube order right!
  Also it looks like octopuses are suffering the “six finger hand” syndrome with their arms from all models.
qingcharles a day ago

I actually had some pretty impressive results (and a few duds). I think we've lost sight of how amazing something like this actually is. I can run this on my low-end GPU in a web browser and it doesn't even tax it, yet it's creating incredible images out of thin air based on a text description I wrote.
Just three years ago this would have been world-changing.
jjice 2 days ago

I don't know a lot about image generation models, but 1B sounds super low for this kind of model, so I'm pretty impressed, personally.
- diggan 2 days ago
  
  If I remember correctly, SD had less than 1B parameters at launch (~2 years ago?), and you could generate pretty impressive images with the right settings and prompts.
  - salviati 2 days ago
    
    Yep! Less than 1B in total [0]:
    > 860M UNet and 123M text encoder
    [0] https://github.com/CompVis/stable-diffusion/blob/main/README...
  - refulgentis a day ago
    
    Janus Pro 1B is a multimodal LLM, not a diffusion model, so it's got a bit more things to pack in the parameters. It is super low parameter count, in an LLM context.
  - jjice 2 days ago
    
    Oh wow okay thank you for the context

amelius 2 days ago

The reason why this doesn't work on Firefox:

https://news.ycombinator.com/item?id=41157383

thefirstname322 18 hours ago

Hi HN! We’re excited to launch JanusPro-AI, an open-source multimodal model from DeepSeek that unifies text-to-image generation, image understanding, and cross-modal reasoning in a single architecture. Unlike proprietary models, JanusPro is MIT-licensed and optimized for cost-efficiency—our 7B-parameter variant was trained for ~$120k, outperforming DALL-E 3 and Stable Diffusion XL in benchmarks like GenEval (0.80 vs. 0.67) 25.

Why JanusPro? Decoupled Visual Encoding: Separates image generation/understanding pathways, eliminating role conflicts in visual processing while maintaining a unified backbone 2.

Hardware Agnostic: Runs efficiently on consumer GPUs (even AMD cards), with users reporting 30% faster inference vs. NVIDIA equivalents 2.

Ethical Safeguards: Open-source license restricts military/illegal use, aligning with responsible AI development

please checkout the website: https://januspro-ai.com/

pentagrama 2 days ago

Happy to have these models running locally on a browser. However, the results are still quite poor for me. For example: https://imgur.com/a/Dn3lxsU

sdesol a day ago

It's not too bad given that it runs in your browser. I took your prompt and asked GPT-4o mini to elaborate on it and got this https://imgur.com/a/qmQ7ZHl
The burger looks good.

gavinguang 21 hours ago

https://www.janusproai.net/ This is janus pro website that can be tried online.

n-gauge a day ago

I like the local running of this and learning about how it works.

Q:These models running in WebGPU all seem to need nodejs installed. I that for just the local 'server side', can you not just use a python http server or tomcat for this and wget files?

andrewmackrodt a day ago

Had a peek at the repo and it looks to be a react frontend, so a JavaScript runtime is needed to "bundle" the application in a way browsers can consume. If you had the dist folder then I imagine you can use whatever web server you want to serve the static files.

jedbrooke 2 days ago

well it was a long shot anyway but it doesn’t seem to work on mobile. (tried on iOS safari on iPhone 11 pro)

a 1B model should be able to run in the RAM constraints of a phone(?) if this is supported soon this would actually be wild. Local LLMs in the palm of your hands

nromiun 2 days ago

I don't know about this model but people have been running local models in Android phones for years now. You just need a large amount of ram (8-12 GB), ggml and Termux. I tried it once with a tiny model and it worked really well.
- kittikitti a day ago
  
  This is from Reddit, what were you expecting?
bla3 2 days ago

This needed a 4 GB renderer process and about that much additional memory use in the GPU process for me, in Chrome.
rahimnathwani a day ago
```
  Local LLMs in the palm of your hands
```
https://apps.apple.com/us/app/mlc-chat/id6448482937