375 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
1
vote
1
answer
70
views
How can i correctly change the LLAMA3 default maxtoken=256 in Blazor app on .NET 10.0?
Description
I am creating a local LLM app, which works OK, but on response it stops at 170 words which are = 256 tokens.
The models I have tried are so far: Meta-Llama-3.1-8B-Instruct-Q8_0.gguf and ...
4
votes
2
answers
738
views
No module named 'llama_models.cli.model' error while llama 3.1 8B downloading
I'm trying to install the LLaMA 3.1 8B model by following the instructions in the llamamodel GitHub README. When I run the command:
llama-model download --source meta --model-id CHOSEN_MODEL_ID
(...
0
votes
0
answers
112
views
pippy examples: torch._dynamo.exc.UserError: It looks like one of the outputs with type <class transformers.cache_utils.DynamicCache> is not supported
when the program starts to initialize pipeline object, a unexpected error was thrown:
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/anaconda3/envs/polar/lib/python3.12/site-...
0
votes
0
answers
55
views
Running Ollama on local computer and prompting from jupyter notebook - does the model recall prior prompts like if it was the same chat?
I am doing some tests using Ollama on local computer, with Llama 3.2, which consists in prompting a task against a document.
I read that after having reached maximum context, I should restart the ...
0
votes
0
answers
51
views
The data type of the llava model uncontrollably changes to float32
I am using the llama-8b-llava model. I have made some modifications to the model, which are non-structural and do not introduce any parameters. During the model loading process, I used the torch....
1
vote
1
answer
166
views
Import "llama_index.llms.ollama" could not be resolved
I have the following imports for a python file thats meant to be a multi llm agent soon. I wanted to use llama_index and I found a nice video from Tech with Tim which explains everything very well. I ...
1
vote
0
answers
120
views
Fine-tuned LLaMA 2–7B with QLoRA, but reloading fails: missing 4bit metadata. Likely saved after LoRA+resize. Need proper 4bit save method
I’ve been working on fine-tuning LLaMA 2–7B using QLoRA with bitsandbytes 4-bit quantization and ran into a weird issue. I did adaptive pretraining on Arabic data with a custom tokenizer (vocab size ~...
1
vote
0
answers
221
views
llama-cpp-python installing for x86_64 instead of arm64
I am trying to set up local, high speed NLP but am failing to install the arm64 version of llama-cpp-python.
Even when I run
CMAKE_ARGS="-DLLAMA_METAL=on -DLLAMA_METAL_EMBED_LIBRARY=on" \
...
2
votes
1
answer
184
views
Llama_cookbook: why are labels not shifted for CausalLM?
I'm studying the llama_cookbok repo, in particular their finetuning example.
This example uses LlamaForCausalLM model and samsum_dataset (input: dialog, output: summary). Now, looking at how they ...
0
votes
0
answers
61
views
Using llama-index with the deployed LLM
I wanted to make a web app that uses llama-index to answer queries using RAG from specific documents. I have locally set up Llama3.2-1B-instruct llm and using that locally to create indexes of the ...
0
votes
0
answers
116
views
Why `mul_mat` in ggml slower than llama.cpp?
I use the following command to compile an executable file for Android:
cmake \
-DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a \
-...
1
vote
0
answers
173
views
How to implement timeout and retry for long-running Hugging Face model inference in Python?
I'm working with a locally hosted Hugging Face transformers model (mistral-7b, llama2-13b, etc.), using the pipeline interface on a GPU server (A100).
Sometimes inference takes much longer than ...
2
votes
1
answer
91
views
How to re-use attention in huggingface
I have a long chunk of text that I need to process using a transformer, I would then like to have users ask different questions about it (all questions are independent, they don't relate to each other)...
1
vote
1
answer
244
views
No stopping token generated by Llama-3.2-1B-Instruct
I am experimenting with Llama-3.2-1B-Instruct for learning purposes. When I try to implement a simple re-write task with Hugging Face transformers, I get a weird result when the model does not ...
1
vote
1
answer
88
views
How to incorporate additional data in fine tuning LLM
My goal is to create a chat bot specialized in answering questions related to diabetes.
I am new to fine tuning and have a couple questions before I begin. My question is about the dataset format and ...