π Ray 1.0.1 is now officially released!
serve_client is serialized. (#11181)serve_client.get_handle("endpoint") will now get a handle to nearest node, increasing scalability in distributed mode. (#11477)num_steps continue training (#11142)tune.with_parameters(), a wrapper function to pass arbitrary objects through the object store to trainables (#11504)DockerSyncer (#11035)tune.is_session_enabled() in the Function API to toggle between Tune and non-tune code (#10840)yield and return statements (#10857)tune.run(callbacks=... (#11001)reuse_actors for function API, which can largely accelerate tuning jobs.π We thank all the contributors for their contribution to this release!
@acxz, @Gekho457, @allenyin55, @AnesBenmerzoug, @michaelzhiluo, @SongGuyang, @maximsmol, @WangTaoTheTonic, @Basasuya, @sumanthratna, @juliusfrost, @maxco2, @Xuxue1, @jparkerholder, @AmeerHajAli, @raulchen, @justinkterry, @herve-alanaai, @richardliaw, @raoul-khour-ts, @C-K-Loan, @mattearllongshot, @robertnishihara, @internetcoffeephone, @Servon-Lee, @clay4444, @fangyeqing, @krfricke, @ffbin, @akotlar, @rkooo567, @chaokunyang, @PidgeyBE, @kfstorm, @barakmich, @amogkam, @edoakes, @ashione, @jseppanen, @ttumiel, @desktable, @pcmoritz, @ingambe, @ConeyLiu, @wuisawesome, @fyrestone, @oliverhu, @ericl, @weepingwillowben, @rkube, @alanwguo, @architkulkarni, @lasagnaphil, @rohitrawat, @ThomasLecat, @stephanie-wang, @suquark, @ijrsvt, @VishDev12, @Leemoonsoo, @scottwedge, @sven1977, @yiranwang52, @carlos-aguayo, @mvindiola1, @zhongchun, @mfitton, @simon-mo
π We're happy to announce the release of Ray 1.0, an important step towards the goal of providing a universal API for distributed computing.
π To learn more about Ray 1.0, check out our blog post and whitepaper.
ray start commands have been cleaned up to remove deprecated arguments@ray.remoteπ₯ Breaking changes:
ray_auto_init, run_errored_only, global_checkpoint_period, with_server (#10518)tune.run(upload_dir, sync_to_cloud, sync_to_driver, sync_on_checkpoint have been moved to tune.SyncConfig [docs] (#10518)π New APIs:
mode, metric, time_budget parameters for tune.run (#10627, #10642)create_scheduler/create_searcher shim layer to create search algorithms/schedulers via string, reducing boilerplate code (#10456).tune.run(resume="run_errored_only") (#10060)Other Changes:
tune.run(log_to_file=...) (#9817)serve.client API makes it easy to appropriately manage lifetime for multiple Serve clusters. (#10460)π We thank all the contributors for their contribution to this release!
@MissiontoMars, @ijrsvt, @desktable, @kfstorm, @lixin-wei, @Yard1, @chaokunyang, @justinkterry, @pxc, @ericl, @WangTaoTheTonic, @carlos-aguayo, @sven1977, @gabrieleoliaro, @alanwguo, @aryairani, @kishansagathiya, @barakmich, @rkube, @SongGuyang, @qicosmos, @ffbin, @PidgeyBE, @sumanthratna, @yushan111, @juliusfrost, @edoakes, @mehrdadn, @Basasuya, @icaropires, @michaelzhiluo, @fyrestone, @robertnishihara, @yncxcw, @oliverhu, @yiranwang52, @ChuaCheowHuan, @raphaelavalos, @suquark, @krfricke, @pcmoritz, @stephanie-wang, @hekaisheng, @zhijunfu, @Vysybyl, @wuisawesome, @sanderland, @richardliaw, @simon-mo, @janblumenkamp, @zhuohan123, @AmeerHajAli, @iamhatesz, @mfitton, @noahshpak, @maximsmol, @weepingwillowben, @raulchen, @09wakharet, @ashione, @henktillman, @architkulkarni, @rkooo567, @zhe-thoughts, @amogkam, @kisuke95, @clarkzinzow, @holli, @raoul-khour-ts
ObjectIDs are now called ObjectRefs because they are not just IDs.ray up --log-new-style. The new output style will be enabled by default (with opt-out) in a later release.ray up and ray down, available with the --log-new-style flag. It will be enabled by default (with opt-out) in a later release. Full output style coverage for Cluster Launcher commands will also be available in a later release. (#9322, #9943, #9960, #9690)ray status debug tool and ray --version (#9091, #8886).ray memory now also supports redis_password (#9492)replay_sequence_length. We now allow a) storing sequences (over time) in replay buffers and retrieving "lock-stepped" multi-agent samples.DistributedTrainableCreator, a simple wrapper for distributed parameter tuning with multi-node DistributedDataParallel models (#9550, #9739)serve.shadow_traffic(endpoint, backend, fraction) duplicates and sends a fraction of the incoming traffic to a specific backend. (#9106)serve.shutdown() cleanup the current Serve instance in Ray cluster. (#8766)num_replicas exceeds the maximum resource in the cluster (#9005)--dashboard-port and the argument dashboard_port to ray.initπ We thank the following contributors for their work on this release:
@jsuarez5341, @amitsadaphule, @krfricke, @williamFalcon, @richardliaw, @heyitsmui, @mehrdadn, @robertnishihara, @gabrieleoliaro, @amogkam, @fyrestone, @mimoralea, @edoakes, @andrijazz, @ElektroChan89, @kisuke95, @justinkterry, @SongGuyang, @barakmich, @bloodymeli, @simon-mo, @TomVeniat, @lixin-wei, @alanwguo, @zhuohan123, @michaelzhiluo, @ijrsvt, @pcmoritz, @LecJackS, @sven1977, @ashione, @JerryLeeCS, @raphaelavalos, @stephanie-wang, @ruifangChen, @vnlitvinov, @yncxcw, @weepingwillowben, @goulou, @acmore, @wuisawesome, @gramhagen, @anabranch, @internetcoffeephone, @Alisahhh, @henktillman, @deanwampler, @p-christ, @Nicolaus93, @WangTaoTheTonic, @allenyin55, @kfstorm, @rkooo567, @ConeyLiu, @09wakharet, @piojanu, @mfitton, @KristianHolsheimer, @AmeerHajAli, @pdames, @ericl, @VishDev12, @suquark, @stefanbschneider, @raulchen, @dcfidalgo, @chappers, @aaarne, @chaokunyang, @sumanthratna, @clarkzinzow, @BalaBalaYi, @maximsmol, @zhongchun, @wumuzi520, @ffbin
max_restarts in @ray.remote). Try it out with max_task_retries=-1 where -1 indicates that the system can retry the task until it succeeds.max_restarts in the @ray.remote decorator instead of max_reconstructions. You can use -1 to indicate infinity, i.e., the system should always restart the actor if it fails unexpectedly.name=<str> in its remote constructor (Actor.options(name='<str>').remote()). To delete the actor, you can use ray.kill.rllib/examples scripts now work for either TensorFlow or PyTorch (--torch command line option).use_pytorch and eager flags in configs and replace these with framework=[tf|tfe|torch].ExperimentAnalysis tool (#8445).tune.report is now the right way to use the Tune function API. tune.track is deprecated (#8388)serve.create_endpoint now requires specifying the backend directly. You can remove serve.set_traffic if there's only one backend per endpoint. (#8764)serve.init API cleanup, the following options were removed:
serve.init now supports namespacing with name. You can run multiple serve clusters with different names on the same ray cluster. (#8449)X-SERVE-SHARD-KEY HTTP header. (#8449)ray up accepts remote URLs that point to the desired cluster YAML. (#8279)ray.init()).π We thank the following contributors for their work on this release:
@pcmoritz, @akharitonov, @devanderhoff, @ffbin, @anabranch, @jasonjmcghee, @kfstorm, @mfitton, @alecbrick, @simon-mo, @konichuvak, @aniryou, @wuisawesome, @robertnishihara, @ramanNarasimhan77, @09wakharet, @richardliaw, @istoica, @ThomasLecat, @sven1977, @ceteri, @acxz, @iamhatesz, @JarnoRFB, @rkooo567, @mehrdadn, @thomasdesr, @janblumenkamp, @ujvl, @edoakes, @maximsmol, @krfricke, @amogkam, @gehring, @ijrsvt, @internetcoffeephone, @LucaCappelletti94, @chaokunyang, @WangTaoTheTonic, @fyrestone, @raulchen, @ConeyLiu, @stephanie-wang, @suquark, @ashione, @Coac, @JosephTLucas, @ericl, @AmeerHajAli, @pdames
ray.cancel API.lineage_pinning_enabled: 1 in the internal config. (#7733)max_concurrent argument. (#7037, #8258, #8285)π We thank the following contributors for their work on this release:
@simon-mo, @robertnishihara, @BalaBalaYi, @ericl, @kfstorm, @tirkarthi, @nflu, @ffbin, @chaokunyang, @ijrsvt, @pcmoritz, @mehrdadn, @sven1977, @iamhatesz, @nmatthews-asapp, @mitchellstern, @edoakes, @anabranch, @billowkiller, @eisber, @ujvl, @allenyin55, @yncxcw, @deanwampler, @DavidMChan, @ConeyLiu, @micafan, @rkooo567, @datayjz, @wizardfishball, @sumanthratna, @ashione, @marload, @stephanie-wang, @richardliaw, @jovany-wang, @MissiontoMars, @aannadi, @fyrestone, @JarnoRFB, @wumuzi520, @roireshef, @acxz, @gramhagen, @Servon-Lee, @ClarkZinzow, @mfitton, @maximsmol, @janblumenkamp, @istoica
ray memory will collect statistics from all nodes. (#7721)Policy.export_model(). (#7759)fail_fast enables experiments to fail quickly. (#7528)serve.create_backend(..., methods=["GET", "POST"]).X-SERVE-CALL-METHOD header or in RayServeHandle through handle.options("method").remote(...).π We thank the following contributors for their work on this release:
@carlbalmer, @BalaBalaYi, @saurabh3949, @maximsmol, @SongGuyang, @istoica, @pcmoritz, @aannadi, @kfstorm, @ijrsvt, @richardliaw, @mehrdadn, @wumuzi520, @cloudhan, @edoakes, @mitchellstern, @robertnishihara, @hhoke, @simon-mo, @ConeyLiu, @stephanie-wang, @rkooo567, @ffbin, @ericl, @hubcity, @sven1977
ray.init(_internal_config=json.dumps({"distributed_ref_counting_enabled": 0})).ray.init(lru_evict=True).A new command ray memory is added to help debug memory usage: (#7589)
ray memory
Object ID Reference Type Object Size Reference Creation Site
; worker pid=51230 ffffffffffffffffffffffff0100008801000000 PINNED_IN_MEMORY 8231 (deserialize task arg) main..sum_task ; driver pid=51174 45b95b1c8bd3a9c4ffffffff010000c801000000 USED_BY_PENDING_TASK ? (task call) memory_demo.py::13 ffffffffffffffffffffffff0100008801000000 USED_BY_PENDING_TASK 8231 (put object) memory_demo.py::6
ef0a6c221819881cffffffff010000c801000000 LOCAL_REFERENCE ? (task call) memory_demo.py::14
actor. __ray_kill__ () to ray.kill(actor). (#7360)use_pickle flag for serialization. (#7474)experimental.NoReturn. (#7475)experimental.signal API. (#7477)prctl(PR_SET_PDEATHSIG) on Linux instead of reaper. (#7150)get_global_worker(), RuntimeContext. (#7638)repeater class for high variance trials. (#7366)@serve.route returns a handle, add handle.scale, handle.set_max_batch_size. (#7569)data_creator fed to TorchTrainer now must return a dataloader rather than datasets.data_loader_config and batch_size are no longer parameters for TorchTrainer.num_workers.@RayRemote annotation is removed.Ray.call(ActorClass::method, actor), the new API is actor.call(ActorClass::method).π We thank the following contributors for their work on this release:
@rkooo567, @maximsmol, @suquark, @mitchellstern, @micafan, @ClarkZinzow, @Jimpachnet, @mwbrulhardt, @ujvl, @chaokunyang, @robertnishihara, @jovany-wang, @hyeonjames, @zhijunfu, @datayjz, @fyrestone, @eisber, @stephanie-wang, @allenyin55, @BalaBalaYi, @simon-mo, @thedrow, @ffbin, @amogkam, @TisonKun, @richardliaw, @ijrsvt, @wumuzi520, @mehrdadn, @raulchen, @landcold7, @ericl, @edoakes, @sven1977, @ashione, @jorenretel, @gramhagen, @kfstorm, @anthonyhsyu, @pcmoritz
ray.show_in_webui to display custom messages for actors. Please try it out and send us feedback! (#6705, #6820, #6822, #6911, #6932, #6955, #7028, #7034)ray.init(_internal_config=json.dumps({"distributed_ref_counting_enabled": 1})). It is designed to help manage memory using precise distributed garbage collection. (#6945, #6946, #7029, #7075, #7218, #7220, #7222, #7235, #7249)ray.experimental.multiprocessing => ray.util.multiprocessingray.experimental.joblib => ray.util.joblibray.experimental.iter => ray.util.iterray.experimental.serve => ray.serveray.experimental.sgd => ray.util.sgdOMP_NUM_THREADS environment variable defaults to 1 if unset. This improves training performance and reduces resource contention. (#6998)psutil and setproctitle to support turning the dashboard on by default. Running import psutil after import ray will use the version of psutil that ships with Ray. (#7031)delete() will not delete objects in the in-memory store. (#7117)--all-nodes option to rsync-up. (#7065)π We thank the following contributors for their work on this release:
@mitchellstern, @hugwi, @deanwampler, @alindkhare, @ericl, @ashione, @fyrestone, @robertnishihara, @pcmoritz, @richardliaw, @yutaizhou, @istoica, @edoakes, @ls-daniel, @BalaBalaYi, @raulchen, @justinkterry, @roireshef, @elpollouk, @kfstorm, @Bassstring, @hhbyyh, @Qstar, @mehrdadn, @chaokunyang, @flying-mojo, @ujvl, @AnanthHari, @rkooo567, @simon-mo, @jovany-wang, @ijrsvt, @ffbin, @AmeerHajAli, @gaocegege, @suquark, @MissiontoMars, @zzyunzhi, @sven1977, @stephanie-wang, @amogkam, @wuisawesome, @aannadi, @maximsmol
ObjectIDs corresponding to ray.put() objects and task returns are now reference counted locally in Python and when passed into a remote task as an argument. ObjectIDs that have a nonzero reference count will not be evicted from the object store. Note that references for ObjectIDs passed into remote tasks inside of other objects (e.g., f.remote((ObjectID,)) or f.remote([ObjectID])) are not currently accounted for. (#6554)asyncio actor support: actors can now define async def method and Ray will run multiple method invocations in the same event loop. The maximum concurrency level can be adjusted with ActorClass.options(max_concurrency=2000).remote().asyncio ObjectID support: Ray ObjectIDs can now be directly awaited using the Python API. await my_object_id is similar to ray.get(my_object_id), but allows context switching to make the operation non-blocking. You can also convert an ObjectID to a asyncio.Future using ObjectID.as_future().ParallelIterators can be used to more convienently load and process data into Ray actors. See the (documentation)[https://ray.readthedocs.io/en/latest/iter.html] for details.multiprocessing.Pool API out of the box, so you can scale existing programs up from a single node to a cluster by only changing the import statment. See the (documentation)[https://ray.readthedocs.io/en/latest/multiprocessing.html] for details.actor. __ray_kill__ () to terminate actors immediately (#6523)π We thank the following contributors for their work on this release:
@chaokunyang, @Qstar, @simon-mo, @wlx65003, @stephanie-wang, @alindkhare, @ashione, @harrisonfeng, @JingGe, @pcmoritz, @zhijunfu, @BalaBalaYi, @kfstorm, @richardliaw, @mitchellstern, @michaelzhiluo, @ziyadedher, @istoica, @EyalSel, @ffbin, @raulchen, @edoakes, @chenk008, @frthjf, @mslapek, @gehring, @hhbyyh, @zzyunzhi, @zhu-eric, @MissiontoMars, @sven1977, @walterddr, @micafan, @inventormc, @robertnishihara, @ericl, @ZhongxiaYan, @mehrdadn, @jovany-wang, @ujvl, @bharatpn