Skip to main content
Stack Overflow
  1. About
  2. For Teams
Filter by
Sorted by
Tagged with
0 votes
1 answer
47 views

I'm trying to connect my meteor app running on localhost:4000 to the server of another meteor app running on localhost:3000; reading the docs, I'm trying to do import { DDP } from 'meteor/ddp-client' ...
Cereal Killer's user avatar
1 vote
0 answers
70 views

I am training a classifier model but since it is taking far too long I want to use multigpu for the training. The current code is rank = int(os.environ["RANK"]) world_size = int(os.environ[&...
0 votes
0 answers
82 views

I would like to use a learning rate scheduler to (potentially) adapt the learning rate after each epoch, depending on a metric gathered from the validation dataset. However, I am not sure how to use ...
B612's user avatar
  • 53
0 votes
0 answers
463 views

I am completely new to distributed programming and I have been trying to port the original code that ran on a multi-node cluster to single-node cluster with multiple GPUs. My goal is to simulate a ...
0 votes
0 answers
419 views

I tried training a yolo model on kangle with 2 Tesla T4 GPUs (15Gb) as follows: model = YOLO('yolo11l.pt') model.train( data='/kaggle/working/tooth-detect-2/data.yaml', epochs=100, batch=...
0 votes
0 answers
322 views

With the following toy script, I am trying to tune a hyper-parameter of "learning rate" for a perceptron with "Optuna" and 5-fold cross validation. My cluster has multiple GPUs on ...
1 vote
1 answer
713 views

During deep learning Training, the update of declared values with this variable is done under with torch.no _grad(). When learning with ddp, it's the process of obtaining the average for each batch ...
2 votes
1 answer
4k views

I'm practicing PyTorch for multiple node DDP on a docker container, and my program runs properly when I run torchrun \ --nnodes=1 \ --node_rank=0 \ --nproc_per_node=gpu \ ...
2 votes
1 answer
778 views

I have access to a couple dozens Dask servers without GPU but with complete control of the software (can wipe them and install something different) and want to accelerate pytorch-lightning model ...
3 votes
3 answers
7k views

Suppose, I use 2 gpus in a DDP setting. So, if I intend to use 16 as a batch size if I run the experiment on a single gpu, should I give 8 as a batch size, or 16 as a batch size in case of using 2 ...
Seungwoo Ryu's user avatar
1 vote
1 answer
298 views

Continuing the discussion from DDP: how do I wait for a connection?: Based on the thread above, we an leverage Tracker.autorun to wait and confirm for a successful connection between a client and ...
blueren's user avatar
  • 2,888
1 vote
0 answers
323 views

I would like to know what number should I select for nodes and gpus. I use Tesla V100-SXM2 (8 boards). I tried: nodes = 1, gpus=1 (only the first gpu works) nodes=1, gpus =8 (It took very long ...
0 votes
1 answer
98 views

I have already an app in meteor/ReactJS with GraphQL. I am trying to separate in a way that the Meteor server and ReactJS frontend runs individually and how to link each other. Any example or ...
3 votes
1 answer
215 views

Essentially, this is what I had in mind : Client const Messages = new Mongo.Collection("messages"); Meteor.call("message", { room: "foo", message: "Hello world"...
12 votes
5 answers
5k views

When I launch my main script on the cluster with ddp mode (2 GPU's), Pytorch Lightning duplicates whatever is executed in the main script, e.g. prints or other logic. I need some extended training ...

15 30 50 per page
1
2 3 4 5
...
14

AltStyle によって変換されたページ (->オリジナル) /