-
Notifications
You must be signed in to change notification settings - Fork 1.5k
PyWorker VS module PyWorker & MPWorker #2052
-
We currently have 3 different ways to create a "PyWorker":
- via
@pyscript/core
module we export bothPyWorker
andMPWorker
to disambiguate between the two interpreters, read the right config and bring in the correct hooks out of the box - within the Python code though, we have
PyWorker
which doesn't do anything like the module exports and it requires mandatory options to specify the interpretertype
, theconfig
to use, and so on ... nothing is inferred but, most notably, there are no defaults and there is noMPWorker
counterpart ... any PyWorker via Python code needs to create references viaPyWorker(url, { "type": "pyodide" })
orPyWorker(url, { "type": "micropython" })
and I am not sure this is the best we can do
On top of that, the JS module exports need to be awaited because they need to bootstrap the whole thing so that sync
in there can work, as well as plugins resolved and the default sync
attached to the xworker
reference, which is also resolved later on, once the interpreter is ready.
On the other hand, the polyscript XWorker
generic constructor accepts a type
to specify the interpreter, but in PyScript the type
is usually either py
or mpy
and we ask our users to specify a type
that is the interpreter name instead ... I find all this a bit confusing and, even if properly documented, I wonder what we could do to improve the Python side of affair.
We have at least 2 options:
- we make it smart enough to automatically provide all the correct fields like the JS export does already ... meaning that the
type
is inferred, so that<script type="py">from pyscript import PyWorker; PyWorker(url)</script>
would, by default, create a pyodideXWorker
and, as we recently introduced thepyscript.config
it's also easy to pass along that too, if the same config is desired.- pros it's less surprise prone
- cons it's actually not what one might desire in MicroPython case, where
mpy
on main would likely bootstrap a pyodide worker, when needed, not an micropython one ... so it's practically counter-intuitive. On top of this, it doesn't reflect the JS exports with explicit names to disambiguate the used interpreter and its conifg
- we actually bring the
MPWorker
to thepyscript
namespace, so thatPyWorker
by default uses pyodide, unless explicitly different, andMPWorker
will always be micropython ... we bring theconfig
in per each environment, if not specified otherwise, so we'll have a more 1:1 behavior with the exported JS counterpart, still without needing toawait
those references like we do in JS
I don't know what would be the most "Pythonic" way forward, all I know is that I don't currently like neither the JS nor the Python behavior around workers, but we need to await
on the JS side, although I really don't like the MPWorker
as a name, yet I also don't like the fact in Python we need to always specify a type
as interpreter name ... I think we could be smarter in both worlds than what we have now, but I also don't want to break the JS side of affair as that's been requested and successfully used already by various developers and breaking it would be, and feel, both sad/bad and unnecessary.
What do you folks think? @ntoll @fpliger @JeffersGlass
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 6 comments 12 replies
-
@ntoll @mchilvers I have actually found an issue with the config
that got solved in polyscript but we already had everything Martin asked for yesterday ... example:
<script type="mpy" async> from pyscript import document, PyWorker worker = PyWorker("./test.py", type="pyodide") await worker.ready document.body.append(await worker.sync.greetings()) </script>
in test.py:
from pyscript import sync def greetings(): return "Hello from Pyodide" sync.greetings = greetings
As a result, whenever the worker is ready that promise gets resolved and you can interact out of the box with whatever utility pyodide exported in the sync
... I think there's nothing else to do from our side except:
- be sure PyScript uses latest polyscript version
- be sure we write some example in the documentation (which we do already but we don't expose that
worker.ready
feature)
Beta Was this translation helpful? Give feedback.
All reactions
-
If worker.ready
is just a reference to a promise that resolves when stuff is, er, "ready", we should update the docs.
I've been thinking about @mchilvers use of workers via the script tag:
<script type="mpy" src="main.py" async></script> <script type="py" src="heavy_stuff.py" worker></script>
I imagine this "feels" quite nice from the perspective that there's only one place (in the HTML) where all the Python code is started.
I wonder if we give a worker tag an id:
<script type="py" src="heavy_stuff.py" worker id="fred"></script>
Then perhaps we should be able to easily grab a reference to the resulting PyWorker
instance from somewhere? Then I'd be able to do something like this in the main thread:
from pyscript import active_workers fred = active_workers["fred"] await fred.ready ... do stuff ...
The name active_workers
is just a placeholder (better naming suggestions welcome), and only worker tags with a valid id get referenced in there. It's just a Python dict.
Thoughts..?
@mchilvers, does this help address your "beginner's eye view of workers" we talked about? @WebReflection can you see any gotchas or blind spots?
To be clear, this isn't a new feature, but a convenience for folks who just want to kick off workers via a script tag.
Beta Was this translation helpful? Give feedback.
All reactions
-
an id
can be used already because of the xworker
exposed by the script ... the current solution within Python code is better and more elegant, the rest feels like a slippery slope to me ...
- you can't know when the MutationObserver reached and bootstrapped that script so it's surprise prone if no
xworker
is attached (which is a polyscript feature, not a PyScript one) - runtime scripts won't be on the page or easily reachable via ID
- a script can exist with or without a worker attribute ... they are all independent
- the HTML is not the source of truth because runtime scripts can be created anyway
- with explicit
PyWorker
creation within your code you have full ownership of that worker ... with a shared script on the page you end up with possiblesync
name clashing conflicts and who knows what else if two scripts deal with the same shared xworker ...
I mean, it's aloready possible to shoot oneself' feet via hidden polyscript features and I am not super happy in promoting such practice in the wild, it's just more problematic for very little benefit.
Also, a worker has no guarantees that other scripts bootstrapped and provided sync
features (in case these need features from the main thread) so this anti-correlational pattern really screams for troubles.
If you bootstrap your PyWorker you have full control and nothing else can mess up with that.
If you do want stuff to mess up though, you add a promise to the window
that resolve with the xworker
once your script landed and it's bootstrapped on the page so that any other script can reach it ... this is also known as SharedWorker but it's a way cleaner API and it doesn't need HTML at all as it can work even within workers (pyodide worker using a shared pyodide worker or mpy worker using a shared pyodide worker and so on) so that if this is what we really want or need (I still need to see a use case for that) I'd rather provide a SharedPyWorker
API still coming from the stdlib, not augmented without control on the HTML page.
HTML can also stream so there's zero guarantee the ID is reachable when a mpy script started ... I think we are better off explaining how to bootstrap a Pyodide worker in our docs ... everything else seems to me looking for troubles.
Beta Was this translation helpful? Give feedback.
All reactions
-
P.S. that being said ... because the xworker.ready
is already there and it's the solution, one can use an id
, grab that script, hope for the best that xworker
exists and it's been botstrapped already, await ready
and use its exposed methods ... nothing to do from our side, still we need to explain that xworker
is attached when the MutationObserver find that node and bootstraps it, never before, like it is for Custom Elements that work only after their definition lands on the registry.
99% of the time this won't be an issue, but in bigger projects/pages this might be an issue, so that at least that shared worker should be before anything else to be sure it's the first that our MutationObaserver bootstrap ... this is all about documentation though, nothing really to change in our code .... the id
to reach a node is a well known HTML feature, nothing there for us to change neither.
Beta Was this translation helpful? Give feedback.
All reactions
-
OK... Thanks for all the clarifications. I think the TL;DR here is we should be VERY CLEAR in the docs about how to start workers. Furthermore, @mchilvers's approach of using a script to start a worker should be discouraged because of all the reasons you state. This is very much a docs/communication/idioms problem because I don't think Martin was unreasonable in his assumption that "it should just work that way". I.e. we need to be explicit that it's not a good idea.
Looks like the worker pages in the docs will need some love to address this.
Beta Was this translation helpful? Give feedback.
All reactions
-
I don't think Martin was unreasonable in his assumption that "it should just work that way"
neither do I ... it's just lack of documentation how to do that ... I've missed it during the docs week but to clarify, Martin wasn't initializing a worker, any worker as script initializes itself, Martin was exploiting a Polyscript property needed for JS only purposes. The fact he got that exploit means he also read docs (and I don't even know where) but that's not how PyScript should offer, imho.
I've tried to say that already during the meeting but after re-watching it I think I wasn't clear, I was also slow in reading, understanding, and providing proper explanations without correcting myself as I really couldn't understand why anyone would end up doing an addEventListener
dance to bootstrap a Pyodide worker because to date nobody ever tried that, not even via JS, as the PyWorker and MPWorker are exposed via the JS module and the xworker
property is really an escape hatch for hacking around, not a field to deal with directly and at runtime on a live HTML page ... so kudos to Martin for finding eithr ways a solution that wasn't supposed to be offered (or working) but it does (due indirections, not intentionally as it was meant to be used) but definitively we want people to understand that if you want control over a worker you should initiate it within your Python code and from there offer, or call, any sync
utility to the worker and from that worker.
Beta Was this translation helpful? Give feedback.
All reactions
-
I think @ntoll hit the nail on the head of my thoughts! I've put together an imaginary/fantasy app of my first idea of what I would like to do to have a simple app that has a main thread in mp, and a worker using pyodide to do some long running task...
https://pyscript.com/@mchilvers/workers-fantasy/latest
Of course, I have no idea whether it would be technically feasible or not(!).
Beta Was this translation helpful? Give feedback.
All reactions
-
it is feasable, my point is that it's also very footgun prone done that way as you don't have any guarantee that worker is handled just by one script and not more + that worker would bootstrap regardless of your mpy driver ... this is by design of how any worker works, and worked, to date, so what is wrong with the Python approach, where you write less, you have more control over pretty much everything, and no unexpected issues can happen? 🤔
Beta Was this translation helpful? Give feedback.
All reactions
-
I had a closer look ... basically you are moving the Python code bootstrap into HTML then you want workers to automatically give you those by id ... my point is that when you point at any ID that's not necessarily already bootstrapped from a MicroPython code point of view. It's true that MutationObserver is fast but if you expect that ID to always find a node down the HTML you might be at some point disappointed with bigger projects. This creates a precedent of non-scalable pattern for bigger projects where a MicroPython thing is bootstrapped ASAP and the pyodide worker is at the end of the page and not parsed yet by the browser. Nothing like this can ever happen if you bootstrap the worker within your code, as you own that worker at that point, it's not a "fantasy DOM element that might or might not be on the page at certain point in time" ... I don't know if this is clear or well explained, but one pattern, the PyWorker
, always works, the other one has tons of caveats behind (ID element override by accident or duplicated, element not bootstrapped by pyscript core already, and so on).
Do we really want to pursue a potentially more problematic approach there? I also find the non relationship between a worker and its consumer hard to reason about ... a worker can live by its own, it doesn't need to be a zombie, it could have an advanced state (internally, in terms of data or operations) and your micropython driver (or js, or anything else) could arrive too late. None of this happens if your code bootstrap the worker.
Moreover ... we have worker terminal
there and by no mean your other script should decide what happens in there as it cna be awaiting inputs or be fully unresponsive, all stuff that again won't happen if you bootstrap a worker for your utilities / usage.
If this is about shared workers then we should come up with a better solution, imho, otherwise everything looks ambiguous and error prone like this ... we globally leak any worker out there just because it has an ID attached? I'd say nope, that's not really cool.
Beta Was this translation helpful? Give feedback.
All reactions
-
to sum up my thoughts: this proposal is not bad per-se, but it's extremely ambiguous in terms of intent. If we want workers to be reachable through our namespace magic, we should think about a way that disambiguate the intent. Just adding an ID to a script element is not, in my opinion, a way to disambiguate such intent, and it will break if a worker terminal
has an ID and that's used for other purposes ... we gotta find a better way, a better attribute, something not ambiguous, to enable this scenario to me. I get its simplicity or usefulness or expectation, I don't like its ambiguous nature and increased complexity over something that was initially planned to be the default for PyScript: everything should run out of a worker. We're not there yet because mpy bootstraps faster, but using workers is still a must do for any pyodide logic, not just as utility. I hope my thoughts are clear ... can we think of any better way to provide this functionality? One that's not ambiguous or in conflict with terminal
too just because an ID is there? We could throw errors if that's the case, sure, but you see it's screaming for disambiguation when the scenario is "zombie worker" to server other scripts around (this, assuming I understood your use case ... if you want to interfere with also a terminal, that's a whole new world of "what ifs" to solve, imho).
Beta Was this translation helpful? Give feedback.
All reactions
-
counter-proposal: we add a worker-target="id"
to the consumer of that worker and we provide there the workers["id"]
out of our namespace ... that makes it clear from the consumer of the worker who wants to use what, we can await in a way or another that such target exists and it's bootstrapped and we can pass along the worker once that's ready ...
from pyscript import workers worker = await workers["target-id"]
This forces users to think about who is consuming the worker, it forces them to add an id
to the worker node, it makes the orchestration less magic and more explicit from our code perspective ... would this be good enough to you?
P.S. this would be a one-to-many relation for the worker element, but each consumer can handle only one worker per time ... we could have a worker-target="id1,id2"
logic in place too though, if many-to-many relationship is desired.
Beta Was this translation helpful? Give feedback.
All reactions
-
Here's a summary of what we discussed.
You can kick off a named worker like this:
<script src="./main.py" type="mpy" async></script> <script src="./worker.py" type="py" worker="fred"></script>
In my main.py
thread:
from pyscript import workers # Do a bunch of UI boilerplate here... ... # Check until an individual worker is running. Returns the `sync` object. fred = await workers["fred"] # We're all up and running, so go do stuff with the worker. meaning_of_life = await fred.deep_thought() # returns 42 ;-)
ALL THE BELOW IS FOR FURTHER LATER DISCUSSION:
Here's how we register the function...
from pyscript import sync def deep_thought(): time.sleep(millions_of_years) return 42 # Currently we do this... BUT... sync.deep_thought = deep_thought # This is the idiomatic Python way to do the above. __ALL__ = [deep_thought, ...]
Also, can we make the signature of instantiating the PyWorker
class the same as the script tag. E.g.
<script src="./worker.py" type="py" worker="fred"></script>
Is the same as..?
fred = PyWorker(src="./worker.py", type="py")
Beta Was this translation helpful? Give feedback.
All reactions
-
Nope, the first one gets what the python worker code exports directly, the latter one returns a worker where you can expose sync methods to the code, reach its sync after awaiting the ready state, terminate the worker when desired / needed. These are already used production requirements so we better think about the former suggestion instead if we want 1:1 API, we can’t remove what’s out there and proven to be useful already.
Beta Was this translation helpful? Give feedback.
All reactions
-
My initial thoughts were that as a writer of PyScript applications, my mental model is as follows:
- My app consists of a main thread and of a zero or more workers.
- That bag/collection/set of workers could be exposed by pyscript with something like `from pyscript import workers'
- I could choose to add a worker to that bag/collection/set in 2 ways (using "name" attribute here just to highlight symmetry of API).
<script name="my-worker" src="./worker.py" type="py" config="./worker.json" worker></script>
orworker = PyWorker(name="my-worker", src="./worker.py", type="py", config="./worker.json")
- Regardless of how the worker was created, I would interact with it in the the exact same way:
# In the <script> case.
worker = workers["my-worker"]
await worker.ready
meaning_of_life = await worker.deep_thought()
# In the PyWorker case.
worker = PyWorker(name="my-worker", src="./worker.py", type="py", config="./worker.json")
await worker.ready
meaning_of_life = await worker.deep_thought()
# Aside: Either way, the worker would be accessible via the collection.
worker = workers["my-worker"]
The above has the benefit of offering (I think!) a consistent way for PyScript app developers to use workers. @WebReflection, you mentioned that there are reasons why a worker created via a <script> element is not and should not the same as a worker created through the PyWorker API, but I don't quite understand that yet. Is it a technical reason (as in possible/not possible) or are there certain use cases where one or the other could provide different functionality.
Regardless of that discussion, using the 'workers' collection obviously raises some questions:
- What happens if I don't name my workers?
- Do they get kicked off but are just not available in the "workers" collection?
- Do they get given a default name?
Beta Was this translation helpful? Give feedback.
All reactions
-
Having re-read the discussion (with a coffee!) it seems that the difference between the <script> worker and PyWorker is in what gets exposed to the main thread? i.e. in the <script> element any module-scope attribute is available, whereas in PyWorker only those explicitly added onto the sync are available?
Beta Was this translation helpful? Give feedback.
All reactions
-
I’ll answer properly on Monday as it’s clear "we" lack understanding of the current offer and how people are already using PyWorker.
we shouldn’t lose features that have been asked already and are powerful:
- provide utilities to the worker from its owner
- being able to terminate a worker and bootstrap a new one if needed (memory constraints)
instead of breaking current features which cover 100% of use cases, we should think how to offer a simplification for the 80% of simple use cases ... heck I start thinking we shouldn’t even use the worker attribute on the HTML and think about a shared attribute instead.
every main can also ask same worker things but workers can be fully paused while serving other consumers so there are technical limitations to consider too. As example, a worker in a while loop that doesn’t block main can’t possibly ever serve another main thread script so we need to think about these details too ... I will provide concrete examples on Monday.
Beta Was this translation helpful? Give feedback.
All reactions
-
This is mostly for @ntoll and @mchilvers but I think most of this post should go directly into our documentation section related to the Worker.
PyWorker right now
It is possible to bootstrap either a micropython or a pyodide worker from either micropython or pyodide and within Python code.
Due different bootstrap time, the most common use case that is going to be tackled in here is MicroPython bootstrapping Pyodide out of Python code, underlying each step within the process.
Structure
For highlight goodness and simplicity sake, we are going to use an mpy
script on the main page, which points at a main.py
file that bootstraps a worker.py
file.
html
<script type="mpy" src="main.py" async></script> <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width,initial-scale=1"> <title>PyWorker - mpy bootstrapping pyodide example</title> <!-- using current latest from npm due fix around config needed recently landed --> <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/@pyscript/core@0.4.40/dist/core.css"> <script type="module" src="https://cdn.jsdelivr.net/npm/@pyscript/core@0.4.40/dist/core.js"></script> <!-- the async attribute is useful but not mandatory --> <script type="mpy" src="main.py" async></script> </head> </html>
main.py
from pyscript import PyWorker, document # bootstrap the pyodide worker with optional config too # the worker here is: # * owned by this script, no JS or Pyodide code in the same page can access it # * it allows pre-sync methods exposure # * it exposes a ready Promise to await pyodide on the worker side # * it then allows using post-sync (utilities exposed by pyodide) worker = PyWorker("worker.py", type="pyodide") # expose an utility that can be invoked *out of the box* in worker.py worker.sync.greetings = lambda: print("Pyodide bootstrapped") print("before ready") # await for Pyodide to complete its bootstrap await worker.ready print("after ready") # await any utility exposed via Pyodide code result = await worker.sync.heavy_computation() print(result) # show the result at the end of the body document.body.append(result) # here we free memory and get rid of everything worker.terminate()
worker.py
from pyscript import sync # use any already exposed utility from main.py sync.greetings() # expose any method meant to be used from main sync.heavy_computation = lambda: 6 * 7
Save these files in a tmp folder and use npx mini-coi ./tmp
to reach out that index.html
and see the following outcome in devtools:
before ready
Pyodide bootstrapped
after ready
42
PyWorker Features:
- it is possible to bootstrap either pyodide or micropython as worker type
- the bootstrap is owned by the code, the worker cannot be messed or polluted with unexpected requests via Atomics, it's only its owner able to use its utilities and expose main utilities to it
- the owner of the worker can indeed add any feature from the main thread, including
fetch
calls that will not have the missing credentials issues, can directly deal with faster DOM manipulation or drive 3rd party libraries meant to be used only on main and not playing well withjs_modules
(i.e. there is no ESM version of these libraries) and so on ... the worker can use instantly, or later on, any of these exposed methods - the worker exposes a
ready
promise that will be fulfilled once the code in the worker is executed - from that time on, the worker on main can await and execute any utility exposed through the worker code / environment
- the worker can be killed at any time, because it is by every mean a Worker
- because of the previous point, the worker can
postMessage
too and the worker code can listen to messages posted ... once again, the worker is a Worker by all means and it enables all possible use cases
Current proposal
Right now we're talking about removing 70% of the features and have workers on the HTML side of affair that:
- can be reached by any MicroPython, JS, or Pyodide script around these
- cannot have any utility exposed through any of the consumers ... these can only offer utilities but can't use main provided utilities
- cannot terminate, use
postMessage
... - ... indeed, these are just Shared Workers that can be consumed by a suggested
from pyscript import workers; utils = await workers["utility"]; await utils.heavy_computation()
This breaks already the terminate()
use case that is being used by other teams in Anaconda and it makes harder for the worker to drive non ESM modules that can only work on main.
I am being asked to also break the imperative API provided by PyWorker to make it less powerful than it is, just to mimic a desired simplification of the worker written on the HTML instead of being bootstrapped with all its features which is what landed already for months and what devs asked for to date ... to this I am saying: "be careful what you ask for" followed by "then you explain to other teams why they can't do what they have been doing already for months".
Counter proposal
Instead of limiting the current offer for an expectation that lacked full understanding of the current offer (mostly because I forgot to document this specific use case so users are right here ... but ...) and just wanted to declare workers on the page we think about any of these alternatives:
<script type="py" shared="utility"></script>
whereshared
is reachable throughpyscript.shared
so that we move out of the worker disambiguation explaining that a shared script is always running within a worker<script type="py" worker="utility"></script>
where, because it's named, that worker can be reached throughpyscript.workers
by name and it's made available to any mpy or py script on the main ... this is effectively a different kind of worker because it exposes directly its exports as opposite of usingsync
convention ... so I'd rather avoid the nameworker
at all and think about previous idea- something else, as long as it's clear that:
- it's not a fully capable
PyWorker
primitive with all the features we currently have - it doesn't need
sync
at all, it's rather handled as a module - it always runs in a worker (but this could be actually not necessarily the case, although I struggle to see use cases for not running in workers but maybe it's more elegant than trashing cross interpreters utilities on the
window
/globalThis
context
- it's not a fully capable
- because of the previous point, maybe we can combine all worlds via
<script type="py" shared="utility" worker></script>
so that:- it's clear that's not just a worker, it's a new "primitive" and we don't need to reflect it imperatively or to break current
PyWorker
which already works and serves well its users out there - maybe instead of
shared
we could use a different attribute ... but then the boilerplate would be awkward, like:<script type="py" id="utility" shared worker></script>
... I'd be OK with it, but also I don't think thatworker
attribute should be there
- it's clear that's not just a worker, it's a new "primitive" and we don't need to reflect it imperatively or to break current
As summary
If we are OK to have shared workers that don't need to break current state of affairs, mislead users about what's a worker
and what's not really a worker but a module from a worker, and so on, I would be more than 100% happy to implement that as much as I can (I still need to investigate the ability to automatically expose all things).
Last, but not least, the code will export just what any Python module exports, through this new way of bootstrapping worker utilities, so that any discussion around __ALL__
VS sync
VS ...
makes little sense to me ... it's going to be a Python module and it will export, without surprising anyone, what any Python module exports accordingly to that code.
I hope this thread helps better understanding where we are now and why I have been pushing back on killing all good things used to date around PyWorker, but also I hope we can reach consensus around alternatives that don't need to break once again everything around PyScript workers.
Last, but not least, I agree that sync
is an ugly name but it makes perfect sense from a main exposing to workers perspective, because any main utility, even if async, is going to be consumed synchronously from the worker as that's how everything works in there already, including DOM manipulation and whatnot.
P.S. we have a meetup talk on Wednesday and I don't think I can work much on this topic until then because we have a release coming out and I need to improve documentation and this post will help me improving that Worker page too. I hope that's OK to you.
Beta Was this translation helpful? Give feedback.
All reactions
-
to whom it might concern, we're moving forward after latest discussions and ideas and this is the related MR and comment on how it works, what it does or how, and what's still needed to be refined/discussed: #2104 (comment)
Beta Was this translation helpful? Give feedback.