Learn how to do AI Image Generation..
aka. Stable Diffusion for Beginners..

If you are starting out with Stable Diffusion, or have yet only dabbled; and you want to get more serious; this page, read carefully, will save you many hours.

1. Ask yourself: "How serious am I about all this?"

If you only want to create a few images for a school flyer or fix the face of the occasional photo, or whatever, then you can use one of the growing collection of online tools. Most offer a few free loss-leader type credits and those basic tools which for 80% of folk will be just fine. Job done.

If you want to take things to the next level, you will need either some online GPU compute power, or else setup a rig at home. Google colab et al, offer decent amounts compute power for free, but for many reasons a home rig is your best option. Start with privacy and slide down the slippery slope from there.

A consumer-lever graphics card can handle this stuff. Yes, it can be done on a CPU, but not in a way conducive to learning or getting shit done. Serious work continues getting Stable Diffusion running on lesser and lesser hardware so perhaps in the not-so-distant future any shit laptop could create 2K wallpapers in under a minute. But for now you need an RTX30+ Nvidia graphics card. Sure, you can mess around with AMD and Mac silicon, but Nvidia is where the game is, so play along or waste whole days fucking with drivers and still not getting anything done. Ethically speaking; they are each as bad as each other, so I don't give a shit where the pendulum currently sits. Ditto Intel-AMD. My current mobo is Ryzen, so...

A 12GB RTX 3060 can currently (end of 2023) be had 2nd hand for under £250. That's a crazy amount of power for the money. 12GB is the MINIMUM VRAM I would recommend for AI work. Ideally, you want 24GB or more. But 12GB will do nicely, and enables you to work with SDXL/SVD/Stable Cascade/etc., train your own models, and so on. Less than this will cause you grief down the line. Don't go there unless you are already there (and saving up, I hope).

If you already have an RTX 20 series card, fair enough, you can get stuff done in a fairly reasonable time. But if you are buying a card for this, don't go below RTX 3060 with 12GB. You will regret it. Save up, sell some shit, whatever it takes. Anyone considering this sort of malarkey definitely has unused possessions lying around; stuff they can punt on eBay. Read Marie Kondo, whatever it takes!

1b. Get more RAM.

Got 32GB or less RAM?

All sorts of issues will dissolve when you install more RAM. Even issues that look like they have nothing to do with RAM. RAM is cheap. Get more RAM. Do it now.

I consider 64GB RAM the minimum for this sort of work; maybe because I like to also have fifty tabs open in my main browser. 32GB might work fine if you are running only AI inference tasks with a 12-16GB GPU. All sorts of errors vanished when I upgraded to 64GB. Sure, 48GB might be fine, but why would you waste a slot with a 16GB chip when you could put in a 32GB chip? Unless you can't.

For me, RAM; being finite like storage is; I like to have enough that I don't have to think about it. When I do start thinking about it, I immediately plan an upgrade (so the cash and parts can start to find their way to me).

Modern Nvidia GPUs will switch to using regular RAM when they run out of VRAM. This sounds great in theory but I have found that it only works well in practice if you have gobs of regular RAM.

I have 12GB VRAM and so this rarely happens, but when it does, it now happens gracefully and very briefly, unlike the painful mess I got running with 32GB RAM, which technically should be "plenty".

The moral of this story: GET MORE RAM: Calculate what you "need", then double it, then add another 50% just for the hell of it..

Now you have enough RAM.

2. Start with ComfyUI.

Unless your needs are very basic, at some point you will become frustrated with the limitations of your Stable Diffusion interface if it isn't ComfyUI. Then you will need to switch to ComfyUI and re-learn your AI tools. Even the weights of your prompts will be wonked. So just start with ComfyUI. As Comfy himself now works for Stability AI, this is a no-brainer; ComfyUI is where it will be at.

I'm not saying that other stable diffusion interfaces don't have something to offer (InvokeAI's canvas can blow your mind, for example). Or that ComfyUI is perfect (it has some very basic UI issues that most apps dealt with back in the 1990's, for real). What I'm saying is that for actual AI image work, you would be best served with ComfyUI. Anything you can do in some other interface you can do in ComfyUI and then some. And if you can't, wait a few days, someone will have written the node by then or asked for it or, you know, ask for it!

And ComfyUI FTW if you want to understand what is actually going on (or any node-based UI, I guess). Yes, you can read white papers for this; there are many many these days; but I mean "with my current task". And in understanding that, sure, a better understanding of the actual process of AI image generation. And in understanding that, the ability to mess with that process for your own nefarious purposes.

The node-based nature of ComfyUI gives you endless possibilities; you think, "wait a minute! I could take the output from there and connect it to there. Boom!", or whatever. From the moment your task begins, you can follow its progress through the nodes; each lighting up green in succession. It's a beautiful thing.

ComfyUI has a steeper learning curve than other AI interfaces. But you can ignore the spaghetti workflows you see online and head straight to Comfy's own examples folder inside the installation for a set of straightforward, simple examples of how to get shit done.

Once you wrap your head around these, that spaghetti starts to make sense. And never forget, images generated by ComfyUI CONTAIN THE WORKFLOW. Well, usually. You can drop any ComfyUI-generated image directly into a ComfyUI tab in your browser and Boom! The entire workflow is recreated. Hit Queue and you're off again.

You add features as you go; as your understanding increases, so does your repertoire of useful nodes. Or vice-versa; whichever suits your learning style. I have a zillion available nodes**, but only a few dozen I actually understand enough to use productively. As I learn new nodes, possibilities increase. It already seems limitless. And so it goes. With technology as fast-moving as this, there's definitely something warm and fuzzy about trying new things the very day they are released into the world; Stable Cascade, SVD, and so on. Anyone with fairly basic tech can do this now, almost any day.

This is better than hoping some interface and/or extension developer somewhere is going to write you into the loop. Maybe. Instead install ComfyUI and get familiar with it. Spend enough time there and you may just come to like it, a lot. It's also leaner and way faster than A111 and other Gradio interfaces, at least on the three rigs I've run comparisons (i.e. my old rig, my old rig + upgraded GPU, and finally my completely upgraded rig).

Note: I have no affiliation with ComfyUI or Stability AI.

I got to know A1111 pretty well with my old CPU setup, I looked at ComfyUI early on but found it too daunting. At any rate, waiting two minutes for an image means it's a fun-only thing, so I wasn't much bothered at that time. After I upgraded (just under budget) I spent a week trying out Stable Diffusion interfaces with my newly-acquired God-like Powa and my mind kept coming back to the node-based interfaces; specifically ComfyUI; perhaps recognising an analog between this and the way stable diffusion itself worked. Also a certain ameteuring roughness. Those graphics you see in the academic white papers have no parallel in A111 et al; but can be visualised in ComfyUI; at least for me; it makes sense.

But please, take a good look at A1111. That's some academic shit. It functions, yes, and many gifted developers have spent many hours making it function magnificently; slowly adding all the latest goodies into yet another tab or sub-tab, or sub-sub-tab. Please look at the interface. Throw it in the INTERFECES pile and move on, all you beautiful developers. Please stop flogging this already-dead horse! Real humans shouldn't have to deal with this shit.

** NEVER add nodes with the context menu. Not only is the stupid click-click-click thing RSI-on-a-stick, but also slow, chaotic and confusing; it's a time-sink. Instead double-click (*sigh*) on some empty space and start typing what you need. Bingo! Fun-browsing your node arsenal at break-time is fine, of course. But not when you're working.

You think, I'd like to see this same seed generated in all my models. In most interfaces, a thought like this would be dangerous. In ComfyUI you simply convert the checkpoint selector widget to an input, add a primitive node, attach it to your newly-created input, set the primitive (now a checkpoint chooser) to "increment", queue extra options batch count to (insert number of models) and queue. Done.

This kind of modular approach enables you to mistake most beautifully and create images simply not possible in any other interfaces.

***

3. RTFM.

AI inference is Deep Mathematics. It's not something normal people will understand or even need to. But you do need to understand the terminology involved; Checkpoint, VAE, Sampler, Scheduler, Prompt, Seed. Once you know what those are and how each is crucial to image generation, you can learn about CLIP, Upscalers, LoRA, SDXL Refiners, IP Adapters, controlnet and so on and so on. It all depends what you want to achieve. A better understanding of the process and the tools involved enables you to do more, and do it better.

Documentation is scarce. Sadly. Most developers care little for it; so keen to code the next feature. This means rather than one person who knows what he's talking about spending a few minutes writing down the basics of operation, THOUSANDS of people need to EACH waste HOURS OF TIME trying to figure out how to use some piece of software effectively.

You may know how I feel about documentation. Mine is in triplicate. I hope the AI "community" gets its shit together on this front, and soon. It's a stain on the community and one of the greatest hurdles to entry for us mere mortals. I get that this technology got "out there" while still basically academic tinkering, but months become years and still basic documentation is missing.

It definitely isn't just ComfyUI node developers, but all AI coders.

4. Know your models.

If you want to know the sort of material a particular model was trained on, pick a seed and run a sequence of inferences with an empty prompt. I usually do 20 images starting at a number I like. Pick your own. Something that shows a wide range of subjects is ideal. Some models will have much more variation than others. Some are remarkably similar. The results can be eye-opening.

Knowing what a model will do with NO prompt gives you a better idea of what will happen once you start throwing prompts at it. 1-20 isn't a bad starting place. Set your seed to 0, enable "increment" in the "control_before_generate" widget, batch count 20, queue it.

NOTE: If your widget says "control_after_generate", I recommend you head to the preferences and fix that madness right away!