Commit f2554a3

committed

Update README.md

1 parent 610d132 commit f2554a3Copy full SHA for f2554a3

File tree

1 file changed

+26

-0

lines changed

README.md

1 file changed

+26

-0

lines changed

`‎README.md‎`

Lines changed: 26 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -21,6 +21,32 @@ Enter a prompt or pick up a picture and press "Generate" (You don't need to prep`
`21`	`21`
`22`	`22`	`![The concept](https://github.com/swiftuiux/coreml-stable-diffusion-swift-example/blob/main/img/img_03.png)`
`23`	`23`
	`24`	`+## How it works`
	`25`	`+`
	`26`	`+### Super short`
	`27`	`+words → numbers → math → picture.`
	`28`	`+`
	`29`	`+### So in short:`
	`30`	`+text → (TextEncoder) → numbers`
	`31`	`+numbers + noise → (U-Net) → hidden image`
	`32`	`+hidden image → (VAE Decoder) → real image`
	`33`	`+real image → (SafetyChecker) → safe output`
	`34`	`+`
	`35`	`+### Basically`
	`36`	`+`
	`37`	`+ 1. You type "a red apple". vocab.json + merges.txt handle tokenization → break it into units like [a] [red] [apple]. TextEncoder.mlmodelc maps those tokens into numerical vectors in latent space.`
	`38`	`+ 2. The model’s brain (U-Net).`
	`39`	`+It starts with just random noise (a messy canvas). Then, step by step, it removes noise and adds structure, following the instructions from your text (the numbers from the TextEncoder). After many steps, what was just noise slowly looks like the picture you asked for. At this stage, the image is not made of pixels like (red, green, blue dots). Instead, it lives in a latent space — basically a compressed mathematical version of the image.`
	`40`	+ 3. Hidden space (latent space). It’s the hidden mathematical space where the U-Net operates. Latent space = a hidden, compressed version of images where the model does its work. Instead of dealing with millions of pixels directly (which is heavy), the model uses a smaller grid of numbers that still captures the essence of shapes, colors, and structures. Think of it like a sketch or a blueprint: not the full detailed image, but enough information to reconstruct it later. This is why it’s called latent (hidden): the image exists there, but only as math.
	`41`	`+ • Latent space = where (Think of it as the canvas the painter is working on)`
	`42`	`+ • U-Net = how (Think of it as the work being done (the painter’s hand moving)`
	`43`	`+ 4. VAE Decoder.`
	`44`	`+Once the latent image is ready, the VAEDecoder.mlmodelc converts it into an actual picture (pixels).`
	`45`	`+If you want to go the other way (picture → latent space), that’s what VAEEncoder.mlmodelc does.`
	`46`	`+ 5. Safety check.`
	`47`	`+Finally, SafetyChecker.mlmodelc looks at the image and makes sure it follows safety rules. If not, it may block or adjust it.`
	`48`	`+It works by running the generated image through a separate classifier (basically another small neural net) that predicts if the picture belongs to any of these categories. If it does, the checker can either blur it, replace it, or just stop the output.`
	`49`	`+`
`24`	`50`	`### Typical set of files for a model und the purpose of each file`
`25`	`51`
`26`	`52`	`\| File Name \| Description \|`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit f2554a3

File tree

1 file changed

1 file changed

`‎README.md‎`

0 commit comments