Duncan Edwards wrote:Very nice work. All of this was rendered out of Stable Diffusion? I'd love to hear more about the details of how everyone handles their creation.
Oof, this is like giving away our secret blend of herbs and spices. I'll divulge what I can. The entire process can be a little convoluted, but this is the gist of it.
For starters, I use Mage.Space with its numerous fine-tuned Stable Diffusion models. I prefer "LifeLike" to get the photorealistic touch on my characters, though many fine-tuned models like Realistic Vision work very well. I like Mage because it allows you to quickly flip between different models, which individually may be better at certain things than others. The default Stable Diffusion 1.5 and 2.1 models are too vague to generate consistent content that we want, though can you brute force it.
For every project I do, I insert my negative prompts. This are generic tags to make sure it doesn't spit out something stupid.
lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, extra limbs, cloned face, disfigured, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, artist signature, watermark
Depending on what I'm making, I load up on tags for the prompts that focus on high resolution, high definition, photography, lighting, style, shadows. etc. Only then do I specify what I want to create.
For the purpose of this experiment, I'll create a series of cheerleader pictures. Along with my style prompts, I'll use the following:
cheerleader, (cheerleader uniform:1.1), (worried face:1.1), (open mouth shouting:1.2), detailed hair, detailed eyes, (sinking in thick muddy mud ripple clay:1.2)
I don't like generating images without a reference image, as it spits out a lot of random stuff.
cheer1.png
Does it look good? Hell yes. The quality of Stable Diffusion models and the right prompts really nail the photorealistic look. I dare say that this is what puts my style of AI work apart at a glance at the moment, though I would love to see others push the envelope. However, it isn't exactly what I wanted - notably, the distinct lack of cheerleaders.
So I need some reference image to work with. More often, I use another AI rendering app, NovelAI, which does more anime-style outputs. It does have a more distinct library of outfits, generates images faster and allows me to cycle through dozens of possible outputs that match what I want. I then run _that_ image through Mage. While I can directly use LifeLike or other photorealistic model, I prefer to put it through a transitional digital art model, such as Lyriel, which does a better job of "translating" the cartoon style outputs, which works better with LifeLike. Using the same prompts in the process:
cheer2.png
Alternatively, I could start by using Photoshop to make a photomanip. You can be lazy in this. I grabbed a cheerleader photo, slapped on some brown paint and added some burn effect for ripples. It just needs to be enough for the AI to recognise my mud prompt.
cheer3.png
My other option is to create the reference image myself. Here, I put together a 1-minute painting that Acidtester would be envious of. Obviously the more detailed the image, the less "work" you have to do to make the AI recognise what you want. It took a few goes put it through the Lyriel > LifeLike renders, but considering the... uh, quality of the reference image, the final output is a thousand times better. I find this to be useful for getting exact poses and framing, which AI randomises the most, though of course I'd try to make the output a _little_ bit clearer.
cheer4.png
This entire demo took around 15 minutes to produce all these images and compile them. The final outputs are unrefined - I would do more inpainting to tweak textures and background and fix any rendering artifacts in Photoshop.