@@ -4,16 +4,20 @@ excerpt: "WebSockets with the Tornado web framework is a simple, robust way to
44handle streaming data. I walk through a minimal example and discuss why these
55tools are good for the job."
66tags :
7- - python
87 - streaming
98 - tornado
109 - websocket
1110header :
1211 overlay_image : /assets/images/cool-backgrounds/cool-background8.png
1312 caption : ' Photo credit: [coolbackgrounds.io](https://coolbackgrounds.io/)'
14- last_modified_at : 2021年09月27日
13+ last_modified_at : 2021年06月13日
14+ search : false
1515---
1616
17+ {% if page.noindex == true %}
18+ <meta name =" robots " content =" noindex " >
19+ {% endif %}
20+ 1721A lot of data science and machine learning practice assumes a static dataset,
1822maybe with some MLOps tooling for rerunning a model pipeline with the freshest
1923version of the dataset.
@@ -31,20 +35,20 @@ requests with REST endpoints). Of course, Tornado has pretty good support for
3135WebSockets as well.
3236
3337In this blog post I'll give a minimal example of using Tornado and WebSockets
34- to handle streaming data. The toy example I have is one app (` server .py` )
35- writing samples of a Bernoulli to a WebSocket, and another app (` client .py` )
38+ to handle streaming data. The toy example I have is one app (` transmitter .py` )
39+ writing samples of a Bernoulli to a WebSocket, and another app (` receiver .py` )
3640listening to the WebSocket and keeping track of the posterior distribution for
3741a [ Beta-Binomial conjugate model] ( https://eigenfoo.xyz/bayesian-bandits/ ) .
3842After walking through the code, I'll discuss these tools, and why they're good
3943choices for working with streaming data.
4044
41- For another tutorial on this same topic, you can check out [ ` proft ` 's blog
45+ For another good tutorial on this same topic, you can check out [ ` proft ` 's blog
4246post] ( https://en.proft.me/2014/05/16/realtime-web-application-tornado-and-websocket/ ) .
4347
44- ## Server
48+ ## Transmitter
4549
46- - When ` WebSocketServer ` is registered to a REST endpoint (in ` main ` ), it keeps
47- track of any processes who are listening to that endpoint, and pushes
50+ - When ` WebSocketHandler ` is registered to a REST endpoint (on line 44 ), it
51+ keeps track of any processes who are listening to that endpoint, and pushes
4852 messages to them when ` send_message ` is called.
4953 * Note that ` clients ` is a class variable, so ` send_message ` is a class
5054 method.
@@ -56,20 +60,12 @@ post](https://en.proft.me/2014/05/16/realtime-web-application-tornado-and-websoc
5660 case. For example, you could watch a file for any modifications using
5761 [ ` watchdog ` ] ( https://pythonhosted.org/watchdog/ ) , and dump the changes into
5862 the WebSocket.
59- - The [ ` websocket_ping_interval ` and ` websocket_ping_timeout ` arguments to
60- ` tornado.Application ` ] ( https://www.tornadoweb.org/en/stable/web.html?highlight=websocket_ping#tornado.web.Application.settings )
61- configure periodic pings of WebSocket connections, keeping connections alive
62- and allowing dropped connections to be detected and closed.
63- - It's also worth noting that there's a
64- [ ` tornado.websocket.WebSocketHandler.websocket_max_message_size ` ] ( https://www.tornadoweb.org/en/stable/websocket.html?highlight=websocket_max_message_size#tornado.websocket.WebSocketHandler )
65- attribute. While this is set to a generous 10 MiB, it's important that the
66- WebSocket messages don't exceed this limit!
6763
68- <script src =" https://gist.github.com/eigenfoo/22f46166fa6924d684d68ca06e08b055 .js " ></script >
64+ <script src =" https://gist.github.com/eigenfoo/cb07fe6f026d544b013b29143e125a38 .js " ></script >
6965
70- ## Client
66+ ## Receiver
7167
72- - ` WebSocketClient ` is a class that:
68+ - ` WebSocketReceiver ` is a class that:
7369 1 . Can be ` start ` ed and ` stop ` ped to connect/disconnect to the WebSocket and
7470 start/stop listening to it in a separate thread
7571 2 . Can process every message (` on_message ` ) it hears from the WebSocket: in
@@ -78,39 +74,17 @@ post](https://en.proft.me/2014/05/16/realtime-web-application-tornado-and-websoc
7874 but this processing could theoretically be anything. For example, you
7975 could do some further processing of the message and then dump that into a
8076 separate WebSocket for other apps (or even users!) to subscribe to.
81- - To connect to the WebSocket, we need to use a WebSocket library: thankfully
82- Tornado has a built-in WebSocket functionality (` tornado.websocket ` ), but
83- we're also free to use other libraries such as the creatively named
84- [ ` websockets ` ] ( https://github.com/aaugustin/websockets ) or
77+ - To connect to the WebSocket, we need to use a WebSocket client, such as the
78+ creatively named
8579 [ ` websocket-client ` ] ( https://github.com/websocket-client/websocket-client ) .
86- - Note that we run ` on_message ` on the same thread as we run
87- ` connect_and_read ` . This isn't a problem so long as ` on_message ` is fast
88- enough, but a potentially wiser choice would be to offload ` connect_and_read `
89- to a separate thread by instantiating a
90- [ ` concurrent.futures.ThreadPoolExecutor ` ] ( https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor )
91- and calling
92- [ ` tornado.ioloop.IOLoop.run_in_executor ` ] ( https://www.tornadoweb.org/en/stable/ioloop.html#tornado.ioloop.IOLoop.run_in_executor ) ,
93- so as not to block the thread where the ` on_message ` processing happens.
94- - The ` io_loop ` instantiated in ` main ` (as well as in ` server.py ` ) is
95- important: it's how Tornado schedules tasks (a.k.a. _ callbacks_ ) for delayed
80+ - Note that we run ` read ` is a separate thread, so as not to block the main
81+ thread (where the ` on_message ` processing happens).
82+ - The ` io_loop ` instantiated on line 50 (as well as in ` transmitter.py ` ) is
83+ important - it's how Tornado schedules tasks (a.k.a. _ callbacks_ ) for delayed
9684 (a.k.a. _ asynchronous_ ) execution. To add a callback, we simply call
9785 ` io_loop.add_callback() ` .
98- - The [ ` ping_interval ` and ` ping_timeout ` arguments to
99- ` websocket_connect ` ] ( https://www.tornadoweb.org/en/stable/websocket.html?highlight=ping_#tornado.websocket.websocket_connect )
100- configure periodic pings of the WebSocket connection, keeping connections
101- alive and allowing dropped connections to be detected and closed.
102- - The ` callback=self.maybe_retry_connection ` is [ run on a future
103- ` WebSocketClientConnection ` ] ( https://github.com/tornadoweb/tornado/blob/1db5b45918da8303d2c6958ee03dbbd5dc2709e9/tornado/websocket.py#L1654-L1655 ) .
104- Here, we simply get the ` future.result() ` (i.e. the WebSocket client
105- connection itself) — I don't actually do anything with the ` self.connection ` ,
106- but you could if you wanted. In the event of an exception while doing that,
107- we assume there's a problem with the WebSocket connection and retry
108- ` connect_and_read ` after 3 seconds. This all has the effect of recovering
109- gracefully if the WebSocket is dropped or ` server.py ` experiences a brief
110- outage for whatever reason (both of which are probably inevitable for
111- long-running apps using WebSockets).
112- 113- <script src =" https://gist.github.com/eigenfoo/341f6c6c578d34120bccc4229e434377.js " ></script >
86+ 87+ <script src =" https://gist.github.com/eigenfoo/a693b67167c775f7fe67329f3797595d.js " ></script >
11488
11589## Why Tornado?
11690
@@ -153,21 +127,6 @@ SSE)](https://www.smashingmagazine.com/2018/02/sse-websockets-data-flow-http2/):
153127it seems to be a cleaner protocol for unidirectional data flow, which is really
154128all that we need.
155129
156- Additionally, [ Armin
157- Ronacher] ( https://lucumr.pocoo.org/2012/9/24/websockets-101/ ) has a much
158- starker view of WebSockets, seeing no value in using WebSockets over TCP/IP
159- sockets for this application:
160- 161- > Websockets make you sad. [ ...] Websockets are complex, way more complex than I
162- > anticipated. I can understand that they work that way but I definitely don't
163- > see a value in using websockets instead of regular TCP connections if all you
164- > want is to exchange data between different endpoints and neither is a browser.
165-
166- My thought after reading these criticisms is that perhaps WebSockets aren't the
167- ideal technology for handling streaming data (from a maintainability or
168- architectural point of view), but that doesn't mean that they aren't good
169- scalable technologies when they do work.
170- 171130---
172131
173132[ ^ 1 ] : There is [ technically a difference] ( https://sqlstream.com/real-time-vs-streaming-a-short-explanation/ ) between "real-time" and "streaming": "real-time" refers to data that comes in as it is created, whereas "streaming" refers to a system that processes data continuously. You stream your TV show from Netflix, but since the show was created long before you watched it, you aren't viewing it in real-time.
0 commit comments