312 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
1
vote
1
answer
65
views
Conditional iteration with snakemake and checkpoint
Issue
For my workflow I need to repeat a certain number of rules multiple times, the idea is that with each of these iteration the output gets better and better.
I had issue with adding new jobs to ...
0
votes
0
answers
43
views
Can a process checkpoint itself with CUDA?
NVIDIA has recently introduced a process-level GPU state checkpoint mechanism into its CUDA driver.
The API calls take a process ID, which is something we don't really see in other CUDA driver API ...
0
votes
1
answer
71
views
Why can't a (CUDA) process be unlocked from the 'checkpointed' state?
NVIDIA has recently introduced a process-level GPU state checkpoint mechanism into its CUDA driver.
Basically, it seems one "locks" a process, so that no more CUDA API calls are accepted; ...
-1
votes
0
answers
13
views
Snakemake WorkflowError: "Target rules may not contain wildcards" when using checkpoint with dynamic wildcards [duplicate]
I’m trying to build a Snakemake pipeline that uses a checkpoint to dynamically determine {sequence} wildcards after processing patient data. My workflow structure is roughly:
A checkpoint ...
0
votes
0
answers
107
views
Flink pipeline latency increases over time despite low processing time per task
I’m currently testing a Flink pipeline with the following architecture:
3 TaskManagers
Each TaskManager has 4 slots
The pipeline structure is:
Source → Map → KeyBy → KeyedProcessFunction → Sink
...
0
votes
1
answer
83
views
Group Window Aggregate function is causing my checkpoint size to increase dramatically
I'm experiencing an issue with state management in my Flink job:
I have two Kafka sources that I'm unioning together and the second source uses a tumbling window based on processing time.
I'm using ...
0
votes
0
answers
238
views
Writing Flink checkpoints to S3
I have a Flink job for which i need to write checkpoints on S3.
This is my configuration:
config = Configuration()
config.set_string("fs.s3a.aws.credentials.provider", "org.apache....
1
vote
0
answers
74
views
Why my trained t5-small model generate a mess after I saved and loaded the checkpoint?
I was distilling my student model (base model t5-small) based on a fine-tuned T5-xxl. Here is the config
student_model = AutoModelForSeq2SeqLM.from_pretrained(
args.student_model_name_or_path,
...
0
votes
1
answer
66
views
Snakemake rules that are dependent on checkpoint outputs skipped
I currenltly have a Snakefile with the following checkpoint & rules:
checkpoint a:
input: "some_bam.bam"
output:
info="some_info.info"
zip="...
0
votes
0
answers
387
views
How can I continue training from the checkpoints saved in the previous training?
I'm working on a kaggle notebook and I have trained a model (which represents OCR (PyTorch)) and saved the checkpoints, now I am trying to load the saved checkpoints to complete the previous training.....
-1
votes
1
answer
62
views
Implement iterative precopy migration for Containers (Docker, Podman)
I am implementing iterative precopy migration for containers (Docker, Podman) using the following commands:
sudo podman container checkpoint -R -e="/home/rohan/Desktop/Precopy migration/...
2
votes
0
answers
309
views
CRIU Restoring FAILED (criu/cr-restore.c:1480) killed by signal 11: Segmentation fault
The command criu restore crashes with a segmentation fault when running the documentation example.
I've installed CRIU using sudo apt-get install criu on my Ubuntu 22.04.4 LTS Jammy computer.
In ...
1
vote
0
answers
927
views
How to fix this error: KeyError: 'model.embed_tokens.weight'
This is the detailed error:
Traceback (most recent call last):
File "/home/cyq/zxc/SmartEdit/train/DS_MLLMSD11_train.py", line 769, in <module>
train()
File "/home/cyq/zxc/...
0
votes
0
answers
99
views
Save and Load - Checkpoint Godot 4
I’m new to game development and recently started developing my first game, a simple 2D game to better understand the platform’s functionalities.
I’m having trouble adjusting the checkpoint saving ...
0
votes
1
answer
125
views
Playbook for checkpoint
I want to make some playbooks for checkpoint; My question is: for checkpoint is there a specific connection string from ansible?
`Procedure to generate database backup in Security Management Server:
$...