Commit 9cd2ad9

committed

Update README.md

1 parent dac9184 commit 9cd2ad9Copy full SHA for 9cd2ad9

File tree

1 file changed

+38

-15

lines changed

README.md

1 file changed

+38

-15

lines changed

`‎README.md`

Lines changed: 38 additions & 15 deletions

Original file line number	Diff line number	Diff line change
`@@ -24,28 +24,51 @@ Enter a prompt or pick up a picture and press "Generate" (You don't need to prep`
`24`	`24`	`## How it works`
`25`	`25`
`26`	`26`	`### Super short`
`27`		`-words → numbers → math → picture.`
	`27`	`+On the first glance it looks like a jungle of files (TextEncoder, U-Net, VAE, SafetyChecker, vocab stuff, etc.), but if you zoom out, the whole pipeline is really just:`
	`28`	`+`
	`29`	`+words → numbers → math → picture → check`
	`30`	`+`
	`31`	`+Everything else is just supporting that flow.`
`28`	`32`
`29`	`33`	`### So in short:`
`30`	`34`	`text → (TextEncoder) → numbers`
`31`	`35`	`numbers + noise → (U-Net) → hidden image`
`32`	`36`	`hidden image → (VAE Decoder) → real image`
`33`	`37`	`real image → (SafetyChecker) → safe output`
`34`	`38`
`35`		`-### Basically`
`36`		`-`
`37`		`- 1. You type "a red apple". vocab.json + merges.txt handle tokenization → break it into units like [a] [red] [apple]. TextEncoder.mlmodelc maps those tokens into numerical vectors in latent space.`
`38`		`- 2. The models brain (U-Net).`
`39`		`-It starts with just random noise (a messy canvas). Then, step by step, it removes noise and adds structure, following the instructions from your text (the numbers from the TextEncoder). After many steps, what was just noise slowly looks like the picture you asked for. At this stage, the image is not made of pixels like (red, green, blue dots). Instead, it lives in a latent space — basically a compressed mathematical version of the image.`
`40`		- 3. Hidden space (latent space). It’s the hidden mathematical space where the U-Net operates. Latent space = a hidden, compressed version of images where the model does its work. Instead of dealing with millions of pixels directly (which is heavy), the model uses a smaller grid of numbers that still captures the essence of shapes, colors, and structures. Think of it like a sketch or a blueprint: not the full detailed image, but enough information to reconstruct it later. This is why it’s called latent (hidden): the image exists there, but only as math.
`41`		`- • Latent space = where (Think of it as the canvas the painter is working on)`
`42`		`- • U-Net = how (Think of it as the work being done (the painter’s hand moving)`
`43`		`- 4. VAE Decoder.`
`44`		`-Once the latent image is ready, the VAEDecoder.mlmodelc converts it into an actual picture (pixels).`
`45`		`-If you want to go the other way (picture → latent space), that’s what VAEEncoder.mlmodelc does.`
`46`		`- 5. Safety check.`
`47`		`-Finally, SafetyChecker.mlmodelc looks at the image and makes sure it follows safety rules. If not, it may block or adjust it.`
`48`		`-It works by running the generated image through a separate classifier (basically another small neural net) that predicts if the picture belongs to any of these categories. If it does, the checker can either blur it, replace it, or just stop the output.`
	`39`	`+## Basically`
	`40`	`+`
	`41`	`+1. Text Encoding`
	`42`	+ You type `"a red apple"`.
	`43`	+ - `vocab.json` + `merges.txt` handle tokenization → break it into units like `[a] [red] [apple]`.
	`44`	+ - `TextEncoder.mlmodelc` maps those tokens into numerical vectors in latent space.
	`45`	`+`
	`46`	`+2. The model’s brain (U-Net)`
	`47`	`+ - Starts with random noise (a messy canvas).`
	`48`	`+ - Step by step, it removes noise and adds structure, following the instructions from your text (the vectors from the TextEncoder).`
	`49`	`+ - After many steps, what was just noise slowly looks like the picture you asked for.`
	`50`	`+ - At this stage, the image is not yet pixels (red/green/blue dots). Instead, it exists in latent space — a compressed mathematical version of the image.`
	`51`	`+`
	`52`	`+3. Hidden space (Latent space)`
	`53`	`+ - Latent space = the hidden mathematical space where the U-Net operates.`
	`54`	`+ - Instead of dealing with millions of pixels directly, the model works with a smaller grid of numbers that still captures the essence of shapes, colors, and structures.`
	`55`	`+ - Think of it like a sketch or blueprint: not the full detailed image, but enough to reconstruct it later.`
	`56`	`+ - That’s why it’s called latent (hidden): the image exists there only as math.`
	`57`	`+ - Latent space = where → (the canvas the painter is working on).`
	`58`	`+ - U-Net = how → (the painter’s hand shaping the canvas).`
	`59`	`+`
	`60`	`+4. VAE Decoder`
	`61`	+ - Once the latent image is ready, `VAEDecoder.mlmodelc` converts it into a real picture (pixels).
	`62`	+ - The opposite direction (picture → latent space) is done by `VAEEncoder.mlmodelc`.
	`63`	`+`
	`64`	`+5. Safety check`
	`65`	+ - Finally, `SafetyChecker.mlmodelc` looks at the generated image and checks if it follows safety rules.
	`66`	`+ - It runs the image through a separate classifier (another neural net) to predict if the image belongs to restricted categories (e.g. nudity, gore, etc.).`
	`67`	`+ - If it does, the checker can:`
	`68`	`+ - blur the image,`
	`69`	`+ - block the image, or`
	`70`	`+ - replace it with a placeholder.`
	`71`	`+`
`49`	`72`
`50`	`73`	`### Typical set of files for a model und the purpose of each file`
`51`	`74`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 9cd2ad9

File tree

1 file changed

1 file changed

`‎README.md`

0 commit comments