0

I don't seem to understand how clusters work in the parameterServerStrategy in TensorFlow, and I need some clarifications.

I have read this tutorial, but they don't mention or explain clearly how to run parameterServerStrategy using multiple machines. I have a working version, but it is on a single machine, and the workers and the ps don't seem to do anything, it the chief that runs everything. I have tried to implement it on multiple machine where I used their global Ip:s and unused ports for the workers and ps, but the chief does not seem to find them.

1
  • Configure the TF_CONFIG environment variable on each machine with the correct cluster addresses and roles to use ParameterServerStrategy across multiple machines. Incorrect TF_CONFIG settings or firewalls are the usual culprits if the chief node can't find other nodes. Commented Aug 25, 2025 at 7:55

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.