This is a repo providing same stable diffusion experiments, regarding textual inversion task and captioning task.
Clone the repo, then create a conda envirnoment from envirnoment.yml and install the dependecies.
conda env create --file=environment.yml conda activate sd pip install -r requirements.txt
The textual inversion experiment creates a video of 20 frames out of the generation of two images that starts from different concepts provided by the user.
It is possible to load concepts giving a valid Huggin Face π€ concept repo: https://huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer
--model_id MODEL_ID The s.d. model checpoint you want to use --from_file, --no-from_file load arguments from file -p PROMPT_FILE_PATH, --prompt_file_path PROMPT_FILE_PATH path file where to read prompt -s SEED, --seed SEED Set the random seed --from_concept_repo FROM_CONCEPT_REPO The start concept you want to use. (Provide a hugginface concept repo) --to_concept_repo TO_CONCEPT_REPO The end concept you want to use. (Provide a hugginface concept repo) --from_prompt FROM_PROMPT Start prompt you want to use --to_prompt TO_PROMPT End prompt you want to use --num_inference_steps NUM_INFERENCE_STEPS Number of inference step. --guidance_scale GUIDANCE_SCALE The guidance scale value to set. --width WIDTH Canvas width of generated image. --height HEIGHT Canvas height of generated image. --use_negative_prompt, --no-use_negative_prompt flag to use negative prompt stored in negative_prompt.txt -b BATCH_SIZE, --batch_size BATCH_SIZE Batch size to use --mps, --no-mps Set the device to 'mps' (M1 Apple)
python textual_inversion.py --from_file -p "prompt_close_up.txt" --mps --num_inference_steps 50
python textual_inversion.py --from_concept_repo "sd-concepts-library/gta5-artwork" --to_concept_repo "sd-concepts-library/low-poly-hd-logos-icons" --from_prompt "A man planting a seed in the <concept> style" --to_prompt "A <concept> of a beautiful tree" --mps --num_inference_steps 60 -s 0
This is more an evaluation across different models to perform image-to-text, providing caption to use as s.d. prompt for recreate the original image.
It has been designed as an investigation task, so I used the notebook captioning_task.ipynb to conduct experiments.
There are 3 different models for image2caption wich have been evaluated
mscoco_finetuned_CoCa-ViT-L-14-laion2B-s13B-b90k vit-gpt2-image-captioning blip-image-captioning-base
And then there is a comparison with a image2prompt model, the CLIP-Interrogator
pharma/CLIP-Interrogator