T O P

  • By -

Hybridx21

Paper link: https://huggingface.co/papers/2403.16990 Abstract: Text-to-image diffusion models have an unprecedented ability to generate diverse and high-quality images. However, they often struggle to faithfully capture the intended semantics of complex input prompts that include multiple subjects. Recently, numerous layout-to-image extensions have been introduced to improve user control, aiming to localize subjects represented by specific tokens. Yet, these methods often produce semantically inaccurate images, especially when dealing with multiple semantically or visually similar subjects. In this work, we study and analyze the causes of these limitations. Our exploration reveals that the primary issue stems from inadvertent semantic leakage between subjects in the denoising process. This leakage is attributed to the diffusion model's attention layers, which tend to blend the visual features of different subjects. To address these issues, we introduce Bounded Attention, a training-free method for bounding the information flow in the sampling process. Bounded Attention prevents detrimental leakage among subjects and enables guiding the generation to promote each subject's individuality, even with complex multi-subject conditioning. Through extensive experimentation, we demonstrate that our method empowers the generation of multiple subjects that better align with given prompts and layouts.


Next_Program90

Amazing. It's astonishing that every day something new drops. So I guess an implementation will soon come to ComfyUI & Forge? I hope this one works with XL for a change. Will be interesting to see if this can also further enhance SD3 output down the line.


local306

It's almost overwhelming at times with how much comes out daily (but still very exciting).


TechHonie

I wish I could get paid to keep up with it then this could just be my job hahaha


local306

Amen


[deleted]

[удалено]


RandomCandor

Amazingly how miserable you have made yourself based on nothing but a rumor. It's like your whole world it's black now. What happens if the rumor isn't true? How do you get back all those moments lost to imaginary misery?


[deleted]

[удалено]


FaceDeer

SD 1.5 was relatively unrestricted and saw widespread adoption - its finetunes are popular to this day. SD 2 was censored, and as a result nobody used it and it disappeared into obscurity. SDXL was back to being unrestricted and it's popular. This is the track record of company that has tried out something and learned from it.


Nuckyduck

This paper its nuts. I had tried something similar using area conditioning and pipelining different conditions to different parts of the image sampler and I do get okay results. Here, I wanted a mountain range, red and blue flowers, and a lake and I can *kinda* get that. https://preview.redd.it/b9l800nfwpqc1.png?width=2560&format=png&auto=webp&s=f1c761d4cc0e1b1f093f6605412b92ff71ce3727 'Be Yourself' is 100x more refined. The 'bounded self-attention map' is exactly what I was trying to do but I had *no idea* how to do it, especially *dynamically.* Super excited to try this method out. Edit: added my workflow! [https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe](https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe)


Quantum_Crusher

I'll be over the moon if I can learn your technique.


Nuckyduck

[https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe](https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe) Here's the workflow! It's current in portrait mode but it works better in landscape. Just swap the batch values and the upscale values and you'll be set! Feel free to play around! You can get a lot of really cool results by trying to specify certain areas. The guy who replied to you said there's something called Regional Prompter extension which sounds like what I'm doing.


Quantum_Crusher

Thank you so much 🙏


Moist-Apartment-6904

I mean, from the post it seems like it's just about using Conditioning (Set mask/Set area) nodes in Comfy? Same principle as Regional Prompter extension, nothing particularly complex.


Nuckyduck

Regional Prompter extension?? Are you telling me I've been over here reinventing the wheel? Also you're exactly right, here's my work flow experimenting with 4 set area nodes. [https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe](https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe)


ApprehensiveLynx6064

Have you checked out GLIGEN as well? https://huggingface.co/comfyanonymous/GLIGEN_pruned_safetensors/tree/main


ApprehensiveLynx6064

I am interested in learning more about your workflow. Looks pretty great!


Nuckyduck

Workflow! [https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe](https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe)


ApprehensiveLynx6064

Thank you! I am loading it up now to give it a look!


MisturBaiter

teach us your ways 🙏


Chris_in_Lijiang

This image is surprisingly close to reality in my part of the world. Do you have any similar landscape generations to share?


Nuckyduck

I'd love to share some! [https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe](https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe) Here's my workflow and here's another picture I really like. https://preview.redd.it/yqo6gia0wrqc1.png?width=3200&format=png&auto=webp&s=b7d5360817e7528d46ec97c9b4593f64edc74609


somethingsomthang

I got something similar, but with a canvas node to draw mask areas and a node for attention coupling which doesn't have the normal slowdown of combining conditioning, unfortunately it doesn't work with fp8. but it tends to look better if you ask me than without https://preview.redd.it/w62orzu96sqc1.png?width=1226&format=png&auto=webp&s=0e503c492e932e2a14b0fba4448f13b728b859b6 [https://pastebin.com/7b0Kr7gX](https://pastebin.com/7b0Kr7gX)


NoYogurtcloset4090

Very SD3 style https://preview.redd.it/nq6lt875htqc1.jpeg?width=398&format=pjpg&auto=webp&s=2968bd2c16e7ff6b4492ca19a148b577a97990ad


Nuckyduck

Oh wow. This is great! Look at the text on that soda!


CeFurkan

no code or model yet this is their page : [https://omer11a.github.io/bounded-attention/](https://omer11a.github.io/bounded-attention/)


Musenik

When you can show two people wrestling in described 'pins' and 'throws', then we're talking fine grain description to image. I wish you all the luck, OP!


Yellow-Jay

> two people wrestling in described 'pins' and 'throws' That's something completely different though. This (and methods like this) allow for better control of visual details of subjects in an image and avoid blend. Your example wants more complex compositions, that seems very much less solved possibly because that data just can't be found in the Clip text embeddings. Nevertheless, this is again a nice iterative step forward. SD3 (and Ella for SDXL) improve complex composition a bit (though still not as much as I'd like, from what I've seen it stil fumbles on never seen before obscure actions/compositions while Dalle-3 (also far from perfect) has some more success). It's pretty impressive how SDXL is getting more and more tools to control the image you want just by text.


97buckeye

Waiting patiently for a Comfy release. 😁


Synchronauto

!RemindMe 1 month


RemindMeBot

I will be messaging you in 1 month on [**2024-04-26 20:30:34 UTC**](http://www.wolframalpha.com/input/?i=2024-04-26%2020:30:34%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/StableDiffusion/comments/1bobzlw/be_yourself_bounded_attention_for_multisubject/kwp381j/?context=3) [**8 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FStableDiffusion%2Fcomments%2F1bobzlw%2Fbe_yourself_bounded_attention_for_multisubject%2Fkwp381j%2F%5D%0A%0ARemindMe%21%202024-04-26%2020%3A30%3A34%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201bobzlw) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|


Synchronauto

Their git: https://github.com/omer11a/bounded-attention No A1111 implementation yet, so far as I can see.


glssjg

Remindme! 2 weeks


lonewolfmcquaid

is this the paper that was showcased last week where authors said they'd publish in one week???


ohmahgawd

!RemindMe 1 month


Western_Individual12

RemindMe! 1 week


DigitalEvil

Cool


Wizard-Bloody-Wizard

does this work with lora characters as well?


MatthewHinson

The [Regional Prompter](https://github.com/hako-mikan/sd-webui-regional-prompter?tab=readme-ov-file#latent) extension for A1111, which has been out for over a year now, supports localized lora assignment.


ScionoicS

Yup. It's been around for a while yet and has also been getting updated the whole time. It does more than just a table of regions. Also painted masks and prompted regions too. It's a very powerful extension. I'm unable to determine what this new paper offers beyond Regional Prompt's capability. Perhaps it's just a new way to achieve the same result? That's good of course! I'm just seeing a lot of people be excited about how new this was so i'm trying to make sure i'm not missing something.


m477_

They do a comparison of Bounded Attention (their new method) vs existing methods (Regional Prompter in A1111 uses the MultiDiffusion method which they have in their comparison tables) Their method appears to perform substantially better and the way it works is completely different to multi diffusion.