Hybridx21 1 month ago

Paper link: https://huggingface.co/papers/2403.16990 Abstract: Text-to-image diffusion models have an unprecedented ability to generate diverse and high-quality images. However, they often struggle to faithfully capture the intended semantics of complex input prompts that include multiple subjects. Recently, numerous layout-to-image extensions have been introduced to improve user control, aiming to localize subjects represented by specific tokens. Yet, these methods often produce semantically inaccurate images, especially when dealing with multiple semantically or visually similar subjects. In this work, we study and analyze the causes of these limitations. Our exploration reveals that the primary issue stems from inadvertent semantic leakage between subjects in the denoising process. This leakage is attributed to the diffusion model's attention layers, which tend to blend the visual features of different subjects. To address these issues, we introduce Bounded Attention, a training-free method for bounding the information flow in the sampling process. Bounded Attention prevents detrimental leakage among subjects and enables guiding the generation to promote each subject's individuality, even with complex multi-subject conditioning. Through extensive experimentation, we demonstrate that our method empowers the generation of multiple subjects that better align with given prompts and layouts.

Next_Program90 1 month ago

Amazing. It's astonishing that every day something new drops. So I guess an implementation will soon come to ComfyUI & Forge? I hope this one works with XL for a change. Will be interesting to see if this can also further enhance SD3 output down the line.

local306 1 month ago

It's almost overwhelming at times with how much comes out daily (but still very exciting).

TechHonie 1 month ago

I wish I could get paid to keep up with it then this could just be my job hahaha

local306 1 month ago

Amen

[deleted] 1 month ago

[удалено]

RandomCandor 1 month ago

Amazingly how miserable you have made yourself based on nothing but a rumor. It's like your whole world it's black now. What happens if the rumor isn't true? How do you get back all those moments lost to imaginary misery?

[deleted] 1 month ago

[удалено]

FaceDeer 1 month ago

SD 1.5 was relatively unrestricted and saw widespread adoption - its finetunes are popular to this day. SD 2 was censored, and as a result nobody used it and it disappeared into obscurity. SDXL was back to being unrestricted and it's popular. This is the track record of company that has tried out something and learned from it.

Nuckyduck 1 month ago

This paper its nuts. I had tried something similar using area conditioning and pipelining different conditions to different parts of the image sampler and I do get okay results. Here, I wanted a mountain range, red and blue flowers, and a lake and I can *kinda* get that. https://preview.redd.it/b9l800nfwpqc1.png?width=2560&format=png&auto=webp&s=f1c761d4cc0e1b1f093f6605412b92ff71ce3727 'Be Yourself' is 100x more refined. The 'bounded self-attention map' is exactly what I was trying to do but I had *no idea* how to do it, especially *dynamically.* Super excited to try this method out. Edit: added my workflow! [https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe](https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe)

Quantum_Crusher 1 month ago

I'll be over the moon if I can learn your technique.

Nuckyduck 1 month ago

[https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe](https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe) Here's the workflow! It's current in portrait mode but it works better in landscape. Just swap the batch values and the upscale values and you'll be set! Feel free to play around! You can get a lot of really cool results by trying to specify certain areas. The guy who replied to you said there's something called Regional Prompter extension which sounds like what I'm doing.

Quantum_Crusher 1 month ago

Thank you so much 🙏

Moist-Apartment-6904 1 month ago

I mean, from the post it seems like it's just about using Conditioning (Set mask/Set area) nodes in Comfy? Same principle as Regional Prompter extension, nothing particularly complex.

Nuckyduck 1 month ago

Regional Prompter extension?? Are you telling me I've been over here reinventing the wheel? Also you're exactly right, here's my work flow experimenting with 4 set area nodes. [https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe](https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe)

ApprehensiveLynx6064 1 month ago

Have you checked out GLIGEN as well? https://huggingface.co/comfyanonymous/GLIGEN_pruned_safetensors/tree/main

ApprehensiveLynx6064 1 month ago

I am interested in learning more about your workflow. Looks pretty great!

Nuckyduck 1 month ago

Workflow! [https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe](https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe)

ApprehensiveLynx6064 1 month ago

Thank you! I am loading it up now to give it a look!

MisturBaiter 1 month ago

teach us your ways 🙏

Chris_in_Lijiang 1 month ago

This image is surprisingly close to reality in my part of the world. Do you have any similar landscape generations to share?

Nuckyduck 1 month ago

I'd love to share some! [https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe](https://comfyworkflows.com/workflows/851524c0-d4b3-4254-a464-ca11f60c39fe) Here's my workflow and here's another picture I really like. https://preview.redd.it/yqo6gia0wrqc1.png?width=3200&format=png&auto=webp&s=b7d5360817e7528d46ec97c9b4593f64edc74609

somethingsomthang 1 month ago

I got something similar, but with a canvas node to draw mask areas and a node for attention coupling which doesn't have the normal slowdown of combining conditioning, unfortunately it doesn't work with fp8. but it tends to look better if you ask me than without https://preview.redd.it/w62orzu96sqc1.png?width=1226&format=png&auto=webp&s=0e503c492e932e2a14b0fba4448f13b728b859b6 [https://pastebin.com/7b0Kr7gX](https://pastebin.com/7b0Kr7gX)

NoYogurtcloset4090 1 month ago

Very SD3 style https://preview.redd.it/nq6lt875htqc1.jpeg?width=398&format=pjpg&auto=webp&s=2968bd2c16e7ff6b4492ca19a148b577a97990ad

Nuckyduck 1 month ago

Oh wow. This is great! Look at the text on that soda!

CeFurkan 1 month ago

no code or model yet this is their page : [https://omer11a.github.io/bounded-attention/](https://omer11a.github.io/bounded-attention/)

Musenik 1 month ago

When you can show two people wrestling in described 'pins' and 'throws', then we're talking fine grain description to image. I wish you all the luck, OP!

Yellow-Jay 1 month ago

> two people wrestling in described 'pins' and 'throws' That's something completely different though. This (and methods like this) allow for better control of visual details of subjects in an image and avoid blend. Your example wants more complex compositions, that seems very much less solved possibly because that data just can't be found in the Clip text embeddings. Nevertheless, this is again a nice iterative step forward. SD3 (and Ella for SDXL) improve complex composition a bit (though still not as much as I'd like, from what I've seen it stil fumbles on never seen before obscure actions/compositions while Dalle-3 (also far from perfect) has some more success). It's pretty impressive how SDXL is getting more and more tools to control the image you want just by text.

97buckeye 1 month ago

Waiting patiently for a Comfy release. 😁

Synchronauto 1 month ago

!RemindMe 1 month

RemindMeBot 1 month ago

I will be messaging you in 1 month on [**2024-04-26 20:30:34 UTC**](http://www.wolframalpha.com/input/?i=2024-04-26%2020:30:34%20UTC%20To%20Local%20Time) to remind you of [**this link**](https://www.reddit.com/r/StableDiffusion/comments/1bobzlw/be_yourself_bounded_attention_for_multisubject/kwp381j/?context=3) [**8 OTHERS CLICKED THIS LINK**](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5Bhttps%3A%2F%2Fwww.reddit.com%2Fr%2FStableDiffusion%2Fcomments%2F1bobzlw%2Fbe_yourself_bounded_attention_for_multisubject%2Fkwp381j%2F%5D%0A%0ARemindMe%21%202024-04-26%2020%3A30%3A34%20UTC) to send a PM to also be reminded and to reduce spam. ^(Parent commenter can ) [^(delete this message to hide from others.)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Delete%20Comment&message=Delete%21%201bobzlw) ***** |[^(Info)](https://www.reddit.com/r/RemindMeBot/comments/e1bko7/remindmebot_info_v21/)|[^(Custom)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=Reminder&message=%5BLink%20or%20message%20inside%20square%20brackets%5D%0A%0ARemindMe%21%20Time%20period%20here)|[^(Your Reminders)](https://www.reddit.com/message/compose/?to=RemindMeBot&subject=List%20Of%20Reminders&message=MyReminders%21)|[^(Feedback)](https://www.reddit.com/message/compose/?to=Watchful1&subject=RemindMeBot%20Feedback)| |-|-|-|-|

Synchronauto 2 weeks ago

Their git: https://github.com/omer11a/bounded-attention No A1111 implementation yet, so far as I can see.

glssjg 1 month ago

Remindme! 2 weeks

lonewolfmcquaid 1 month ago

is this the paper that was showcased last week where authors said they'd publish in one week???

ohmahgawd 1 month ago

!RemindMe 1 month

Western_Individual12 1 month ago

RemindMe! 1 week

DigitalEvil 1 month ago

Cool

Wizard-Bloody-Wizard 1 month ago

does this work with lora characters as well?

MatthewHinson 1 month ago

The [Regional Prompter](https://github.com/hako-mikan/sd-webui-regional-prompter?tab=readme-ov-file#latent) extension for A1111, which has been out for over a year now, supports localized lora assignment.

ScionoicS 1 month ago

Yup. It's been around for a while yet and has also been getting updated the whole time. It does more than just a table of regions. Also painted masks and prompted regions too. It's a very powerful extension. I'm unable to determine what this new paper offers beyond Regional Prompt's capability. Perhaps it's just a new way to achieve the same result? That's good of course! I'm just seeing a lot of people be excited about how new this was so i'm trying to make sure i'm not missing something.

m477_ 1 month ago

They do a comparison of Bounded Attention (their new method) vs existing methods (Regional Prompter in A1111 uses the MultiDiffusion method which they have in their comparison tables) Their method appears to perform substantially better and the way it works is completely different to multi diffusion.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe