Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

PyWorker VS module PyWorker & MPWorker #2052

WebReflection started this conversation in Proposals
Discussion options

We currently have 3 different ways to create a "PyWorker":

  • via @pyscript/core module we export both PyWorker and MPWorker to disambiguate between the two interpreters, read the right config and bring in the correct hooks out of the box
  • within the Python code though, we have PyWorker which doesn't do anything like the module exports and it requires mandatory options to specify the interpreter type, the config to use, and so on ... nothing is inferred but, most notably, there are no defaults and there is no MPWorker counterpart ... any PyWorker via Python code needs to create references via PyWorker(url, { "type": "pyodide" }) or PyWorker(url, { "type": "micropython" }) and I am not sure this is the best we can do

On top of that, the JS module exports need to be awaited because they need to bootstrap the whole thing so that sync in there can work, as well as plugins resolved and the default sync attached to the xworker reference, which is also resolved later on, once the interpreter is ready.

On the other hand, the polyscript XWorker generic constructor accepts a type to specify the interpreter, but in PyScript the type is usually either py or mpy and we ask our users to specify a type that is the interpreter name instead ... I find all this a bit confusing and, even if properly documented, I wonder what we could do to improve the Python side of affair.

We have at least 2 options:

  • we make it smart enough to automatically provide all the correct fields like the JS export does already ... meaning that the type is inferred, so that <script type="py">from pyscript import PyWorker; PyWorker(url)</script> would, by default, create a pyodide XWorker and, as we recently introduced the pyscript.config it's also easy to pass along that too, if the same config is desired.
    • pros it's less surprise prone
    • cons it's actually not what one might desire in MicroPython case, where mpy on main would likely bootstrap a pyodide worker, when needed, not an micropython one ... so it's practically counter-intuitive. On top of this, it doesn't reflect the JS exports with explicit names to disambiguate the used interpreter and its conifg
  • we actually bring the MPWorker to the pyscript namespace, so that PyWorker by default uses pyodide, unless explicitly different, and MPWorker will always be micropython ... we bring the config in per each environment, if not specified otherwise, so we'll have a more 1:1 behavior with the exported JS counterpart, still without needing to await those references like we do in JS

I don't know what would be the most "Pythonic" way forward, all I know is that I don't currently like neither the JS nor the Python behavior around workers, but we need to await on the JS side, although I really don't like the MPWorker as a name, yet I also don't like the fact in Python we need to always specify a type as interpreter name ... I think we could be smarter in both worlds than what we have now, but I also don't want to break the JS side of affair as that's been requested and successfully used already by various developers and breaking it would be, and feel, both sad/bad and unnecessary.

What do you folks think? @ntoll @fpliger @JeffersGlass

You must be logged in to vote

Replies: 6 comments 12 replies

Comment options

@ntoll @mchilvers I have actually found an issue with the config that got solved in polyscript but we already had everything Martin asked for yesterday ... example:

<script type="mpy" async>
 from pyscript import document, PyWorker
 worker = PyWorker("./test.py", type="pyodide")
 await worker.ready
 document.body.append(await worker.sync.greetings())
</script>

in test.py:

from pyscript import sync
def greetings():
 return "Hello from Pyodide"
sync.greetings = greetings

As a result, whenever the worker is ready that promise gets resolved and you can interact out of the box with whatever utility pyodide exported in the sync ... I think there's nothing else to do from our side except:

  • be sure PyScript uses latest polyscript version
  • be sure we write some example in the documentation (which we do already but we don't expose that worker.ready feature)
You must be logged in to vote
5 replies
Comment options

If worker.ready is just a reference to a promise that resolves when stuff is, er, "ready", we should update the docs.

I've been thinking about @mchilvers use of workers via the script tag:

<script type="mpy" src="main.py" async></script>
<script type="py" src="heavy_stuff.py" worker></script>

I imagine this "feels" quite nice from the perspective that there's only one place (in the HTML) where all the Python code is started.

I wonder if we give a worker tag an id:

<script type="py" src="heavy_stuff.py" worker id="fred"></script>

Then perhaps we should be able to easily grab a reference to the resulting PyWorker instance from somewhere? Then I'd be able to do something like this in the main thread:

from pyscript import active_workers
fred = active_workers["fred"]
await fred.ready
... do stuff ...

The name active_workers is just a placeholder (better naming suggestions welcome), and only worker tags with a valid id get referenced in there. It's just a Python dict.

Thoughts..?

@mchilvers, does this help address your "beginner's eye view of workers" we talked about? @WebReflection can you see any gotchas or blind spots?

To be clear, this isn't a new feature, but a convenience for folks who just want to kick off workers via a script tag.

Comment options

WebReflection May 30, 2024
Maintainer Author

an id can be used already because of the xworker exposed by the script ... the current solution within Python code is better and more elegant, the rest feels like a slippery slope to me ...

  • you can't know when the MutationObserver reached and bootstrapped that script so it's surprise prone if no xworker is attached (which is a polyscript feature, not a PyScript one)
  • runtime scripts won't be on the page or easily reachable via ID
  • a script can exist with or without a worker attribute ... they are all independent
  • the HTML is not the source of truth because runtime scripts can be created anyway
  • with explicit PyWorker creation within your code you have full ownership of that worker ... with a shared script on the page you end up with possible sync name clashing conflicts and who knows what else if two scripts deal with the same shared xworker ...

I mean, it's aloready possible to shoot oneself' feet via hidden polyscript features and I am not super happy in promoting such practice in the wild, it's just more problematic for very little benefit.

Also, a worker has no guarantees that other scripts bootstrapped and provided sync features (in case these need features from the main thread) so this anti-correlational pattern really screams for troubles.

If you bootstrap your PyWorker you have full control and nothing else can mess up with that.

If you do want stuff to mess up though, you add a promise to the window that resolve with the xworker once your script landed and it's bootstrapped on the page so that any other script can reach it ... this is also known as SharedWorker but it's a way cleaner API and it doesn't need HTML at all as it can work even within workers (pyodide worker using a shared pyodide worker or mpy worker using a shared pyodide worker and so on) so that if this is what we really want or need (I still need to see a use case for that) I'd rather provide a SharedPyWorker API still coming from the stdlib, not augmented without control on the HTML page.

HTML can also stream so there's zero guarantee the ID is reachable when a mpy script started ... I think we are better off explaining how to bootstrap a Pyodide worker in our docs ... everything else seems to me looking for troubles.

Comment options

WebReflection May 30, 2024
Maintainer Author

P.S. that being said ... because the xworker.ready is already there and it's the solution, one can use an id, grab that script, hope for the best that xworker exists and it's been botstrapped already, await ready and use its exposed methods ... nothing to do from our side, still we need to explain that xworker is attached when the MutationObserver find that node and bootstraps it, never before, like it is for Custom Elements that work only after their definition lands on the registry.

99% of the time this won't be an issue, but in bigger projects/pages this might be an issue, so that at least that shared worker should be before anything else to be sure it's the first that our MutationObaserver bootstrap ... this is all about documentation though, nothing really to change in our code .... the id to reach a node is a well known HTML feature, nothing there for us to change neither.

Comment options

OK... Thanks for all the clarifications. I think the TL;DR here is we should be VERY CLEAR in the docs about how to start workers. Furthermore, @mchilvers's approach of using a script to start a worker should be discouraged because of all the reasons you state. This is very much a docs/communication/idioms problem because I don't think Martin was unreasonable in his assumption that "it should just work that way". I.e. we need to be explicit that it's not a good idea.

Looks like the worker pages in the docs will need some love to address this.

Comment options

WebReflection May 30, 2024
Maintainer Author

I don't think Martin was unreasonable in his assumption that "it should just work that way"

neither do I ... it's just lack of documentation how to do that ... I've missed it during the docs week but to clarify, Martin wasn't initializing a worker, any worker as script initializes itself, Martin was exploiting a Polyscript property needed for JS only purposes. The fact he got that exploit means he also read docs (and I don't even know where) but that's not how PyScript should offer, imho.

I've tried to say that already during the meeting but after re-watching it I think I wasn't clear, I was also slow in reading, understanding, and providing proper explanations without correcting myself as I really couldn't understand why anyone would end up doing an addEventListener dance to bootstrap a Pyodide worker because to date nobody ever tried that, not even via JS, as the PyWorker and MPWorker are exposed via the JS module and the xworker property is really an escape hatch for hacking around, not a field to deal with directly and at runtime on a live HTML page ... so kudos to Martin for finding eithr ways a solution that wasn't supposed to be offered (or working) but it does (due indirections, not intentionally as it was meant to be used) but definitively we want people to understand that if you want control over a worker you should initiate it within your Python code and from there offer, or call, any sync utility to the worker and from that worker.

Comment options

I think @ntoll hit the nail on the head of my thoughts! I've put together an imaginary/fantasy app of my first idea of what I would like to do to have a simple app that has a main thread in mp, and a worker using pyodide to do some long running task...

https://pyscript.com/@mchilvers/workers-fantasy/latest

Of course, I have no idea whether it would be technically feasible or not(!).

You must be logged in to vote
4 replies
Comment options

WebReflection May 30, 2024
Maintainer Author

it is feasable, my point is that it's also very footgun prone done that way as you don't have any guarantee that worker is handled just by one script and not more + that worker would bootstrap regardless of your mpy driver ... this is by design of how any worker works, and worked, to date, so what is wrong with the Python approach, where you write less, you have more control over pretty much everything, and no unexpected issues can happen? 🤔

Comment options

WebReflection May 30, 2024
Maintainer Author

I had a closer look ... basically you are moving the Python code bootstrap into HTML then you want workers to automatically give you those by id ... my point is that when you point at any ID that's not necessarily already bootstrapped from a MicroPython code point of view. It's true that MutationObserver is fast but if you expect that ID to always find a node down the HTML you might be at some point disappointed with bigger projects. This creates a precedent of non-scalable pattern for bigger projects where a MicroPython thing is bootstrapped ASAP and the pyodide worker is at the end of the page and not parsed yet by the browser. Nothing like this can ever happen if you bootstrap the worker within your code, as you own that worker at that point, it's not a "fantasy DOM element that might or might not be on the page at certain point in time" ... I don't know if this is clear or well explained, but one pattern, the PyWorker, always works, the other one has tons of caveats behind (ID element override by accident or duplicated, element not bootstrapped by pyscript core already, and so on).

Do we really want to pursue a potentially more problematic approach there? I also find the non relationship between a worker and its consumer hard to reason about ... a worker can live by its own, it doesn't need to be a zombie, it could have an advanced state (internally, in terms of data or operations) and your micropython driver (or js, or anything else) could arrive too late. None of this happens if your code bootstrap the worker.

Moreover ... we have worker terminal there and by no mean your other script should decide what happens in there as it cna be awaiting inputs or be fully unresponsive, all stuff that again won't happen if you bootstrap a worker for your utilities / usage.

If this is about shared workers then we should come up with a better solution, imho, otherwise everything looks ambiguous and error prone like this ... we globally leak any worker out there just because it has an ID attached? I'd say nope, that's not really cool.

Comment options

WebReflection May 30, 2024
Maintainer Author

to sum up my thoughts: this proposal is not bad per-se, but it's extremely ambiguous in terms of intent. If we want workers to be reachable through our namespace magic, we should think about a way that disambiguate the intent. Just adding an ID to a script element is not, in my opinion, a way to disambiguate such intent, and it will break if a worker terminal has an ID and that's used for other purposes ... we gotta find a better way, a better attribute, something not ambiguous, to enable this scenario to me. I get its simplicity or usefulness or expectation, I don't like its ambiguous nature and increased complexity over something that was initially planned to be the default for PyScript: everything should run out of a worker. We're not there yet because mpy bootstraps faster, but using workers is still a must do for any pyodide logic, not just as utility. I hope my thoughts are clear ... can we think of any better way to provide this functionality? One that's not ambiguous or in conflict with terminal too just because an ID is there? We could throw errors if that's the case, sure, but you see it's screaming for disambiguation when the scenario is "zombie worker" to server other scripts around (this, assuming I understood your use case ... if you want to interfere with also a terminal, that's a whole new world of "what ifs" to solve, imho).

Comment options

WebReflection May 30, 2024
Maintainer Author

counter-proposal: we add a worker-target="id" to the consumer of that worker and we provide there the workers["id"] out of our namespace ... that makes it clear from the consumer of the worker who wants to use what, we can await in a way or another that such target exists and it's bootstrapped and we can pass along the worker once that's ready ...

from pyscript import workers
worker = await workers["target-id"]

This forces users to think about who is consuming the worker, it forces them to add an id to the worker node, it makes the orchestration less magic and more explicit from our code perspective ... would this be good enough to you?

P.S. this would be a one-to-many relation for the worker element, but each consumer can handle only one worker per time ... we could have a worker-target="id1,id2" logic in place too though, if many-to-many relationship is desired.

Comment options

Here's a summary of what we discussed.

You can kick off a named worker like this:

<script src="./main.py" type="mpy" async></script>
<script src="./worker.py" type="py" worker="fred"></script>

In my main.py thread:

from pyscript import workers
# Do a bunch of UI boilerplate here...
...
# Check until an individual worker is running. Returns the `sync` object.
fred = await workers["fred"]
# We're all up and running, so go do stuff with the worker.
meaning_of_life = await fred.deep_thought() # returns 42 ;-)

ALL THE BELOW IS FOR FURTHER LATER DISCUSSION:

Here's how we register the function...

from pyscript import sync
def deep_thought():
 time.sleep(millions_of_years)
 return 42
# Currently we do this... BUT...
sync.deep_thought = deep_thought
# This is the idiomatic Python way to do the above.
__ALL__ = [deep_thought, ...]

Also, can we make the signature of instantiating the PyWorker class the same as the script tag. E.g.

<script src="./worker.py" type="py" worker="fred"></script>

Is the same as..?

fred = PyWorker(src="./worker.py", type="py")
You must be logged in to vote
1 reply
Comment options

WebReflection May 31, 2024
Maintainer Author

Nope, the first one gets what the python worker code exports directly, the latter one returns a worker where you can expose sync methods to the code, reach its sync after awaiting the ready state, terminate the worker when desired / needed. These are already used production requirements so we better think about the former suggestion instead if we want 1:1 API, we can’t remove what’s out there and proven to be useful already.

Comment options

My initial thoughts were that as a writer of PyScript applications, my mental model is as follows:

  1. My app consists of a main thread and of a zero or more workers.
  2. That bag/collection/set of workers could be exposed by pyscript with something like `from pyscript import workers'
  3. I could choose to add a worker to that bag/collection/set in 2 ways (using "name" attribute here just to highlight symmetry of API).
  • <script name="my-worker" src="./worker.py" type="py" config="./worker.json" worker></script> or
  • worker = PyWorker(name="my-worker", src="./worker.py", type="py", config="./worker.json")
  1. Regardless of how the worker was created, I would interact with it in the the exact same way:
# In the <script> case.
worker = workers["my-worker"] 
await worker.ready
meaning_of_life = await worker.deep_thought()
# In the PyWorker case.
worker = PyWorker(name="my-worker", src="./worker.py", type="py", config="./worker.json")
await worker.ready
meaning_of_life = await worker.deep_thought()
# Aside: Either way, the worker would be accessible via the collection.
worker = workers["my-worker"] 

The above has the benefit of offering (I think!) a consistent way for PyScript app developers to use workers. @WebReflection, you mentioned that there are reasons why a worker created via a <script> element is not and should not the same as a worker created through the PyWorker API, but I don't quite understand that yet. Is it a technical reason (as in possible/not possible) or are there certain use cases where one or the other could provide different functionality.

Regardless of that discussion, using the 'workers' collection obviously raises some questions:

  • What happens if I don't name my workers?
    • Do they get kicked off but are just not available in the "workers" collection?
    • Do they get given a default name?
You must be logged in to vote
2 replies
Comment options

Having re-read the discussion (with a coffee!) it seems that the difference between the <script> worker and PyWorker is in what gets exposed to the main thread? i.e. in the <script> element any module-scope attribute is available, whereas in PyWorker only those explicitly added onto the sync are available?

Comment options

WebReflection Jun 1, 2024
Maintainer Author

I’ll answer properly on Monday as it’s clear "we" lack understanding of the current offer and how people are already using PyWorker.

we shouldn’t lose features that have been asked already and are powerful:

  • provide utilities to the worker from its owner
  • being able to terminate a worker and bootstrap a new one if needed (memory constraints)

instead of breaking current features which cover 100% of use cases, we should think how to offer a simplification for the 80% of simple use cases ... heck I start thinking we shouldn’t even use the worker attribute on the HTML and think about a shared attribute instead.

every main can also ask same worker things but workers can be fully paused while serving other consumers so there are technical limitations to consider too. As example, a worker in a while loop that doesn’t block main can’t possibly ever serve another main thread script so we need to think about these details too ... I will provide concrete examples on Monday.

Comment options

This is mostly for @ntoll and @mchilvers but I think most of this post should go directly into our documentation section related to the Worker.

PyWorker right now

It is possible to bootstrap either a micropython or a pyodide worker from either micropython or pyodide and within Python code.

Due different bootstrap time, the most common use case that is going to be tackled in here is MicroPython bootstrapping Pyodide out of Python code, underlying each step within the process.

Structure

For highlight goodness and simplicity sake, we are going to use an mpy script on the main page, which points at a main.py file that bootstraps a worker.py file.

html

<script type="mpy" src="main.py" async></script>
<!DOCTYPE html>
<html lang="en">
 <head>
 <meta charset="utf-8">
 <meta name="viewport" content="width=device-width,initial-scale=1">
 <title>PyWorker - mpy bootstrapping pyodide example</title>
 <!-- using current latest from npm due fix around config needed recently landed -->
 <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/@pyscript/core@0.4.40/dist/core.css">
 <script type="module" src="https://cdn.jsdelivr.net/npm/@pyscript/core@0.4.40/dist/core.js"></script>
 <!-- the async attribute is useful but not mandatory -->
 <script type="mpy" src="main.py" async></script>
 </head>
</html>

main.py

from pyscript import PyWorker, document
# bootstrap the pyodide worker with optional config too
# the worker here is:
# * owned by this script, no JS or Pyodide code in the same page can access it
# * it allows pre-sync methods exposure
# * it exposes a ready Promise to await pyodide on the worker side
# * it then allows using post-sync (utilities exposed by pyodide)
worker = PyWorker("worker.py", type="pyodide")
# expose an utility that can be invoked *out of the box* in worker.py
worker.sync.greetings = lambda: print("Pyodide bootstrapped")
print("before ready")
# await for Pyodide to complete its bootstrap
await worker.ready
print("after ready")
# await any utility exposed via Pyodide code
result = await worker.sync.heavy_computation()
print(result)
# show the result at the end of the body
document.body.append(result)
# here we free memory and get rid of everything
worker.terminate()

worker.py

from pyscript import sync
# use any already exposed utility from main.py
sync.greetings()
# expose any method meant to be used from main
sync.heavy_computation = lambda: 6 * 7

Save these files in a tmp folder and use npx mini-coi ./tmp to reach out that index.html and see the following outcome in devtools:

before ready
Pyodide bootstrapped
after ready
42

PyWorker Features:

  • it is possible to bootstrap either pyodide or micropython as worker type
  • the bootstrap is owned by the code, the worker cannot be messed or polluted with unexpected requests via Atomics, it's only its owner able to use its utilities and expose main utilities to it
  • the owner of the worker can indeed add any feature from the main thread, including fetch calls that will not have the missing credentials issues, can directly deal with faster DOM manipulation or drive 3rd party libraries meant to be used only on main and not playing well with js_modules (i.e. there is no ESM version of these libraries) and so on ... the worker can use instantly, or later on, any of these exposed methods
  • the worker exposes a ready promise that will be fulfilled once the code in the worker is executed
  • from that time on, the worker on main can await and execute any utility exposed through the worker code / environment
  • the worker can be killed at any time, because it is by every mean a Worker
  • because of the previous point, the worker can postMessage too and the worker code can listen to messages posted ... once again, the worker is a Worker by all means and it enables all possible use cases

Current proposal

Right now we're talking about removing 70% of the features and have workers on the HTML side of affair that:

  • can be reached by any MicroPython, JS, or Pyodide script around these
  • cannot have any utility exposed through any of the consumers ... these can only offer utilities but can't use main provided utilities
  • cannot terminate, use postMessage ...
  • ... indeed, these are just Shared Workers that can be consumed by a suggested from pyscript import workers; utils = await workers["utility"]; await utils.heavy_computation()

This breaks already the terminate() use case that is being used by other teams in Anaconda and it makes harder for the worker to drive non ESM modules that can only work on main.

I am being asked to also break the imperative API provided by PyWorker to make it less powerful than it is, just to mimic a desired simplification of the worker written on the HTML instead of being bootstrapped with all its features which is what landed already for months and what devs asked for to date ... to this I am saying: "be careful what you ask for" followed by "then you explain to other teams why they can't do what they have been doing already for months".

Counter proposal

Instead of limiting the current offer for an expectation that lacked full understanding of the current offer (mostly because I forgot to document this specific use case so users are right here ... but ...) and just wanted to declare workers on the page we think about any of these alternatives:

  1. <script type="py" shared="utility"></script> where shared is reachable through pyscript.shared so that we move out of the worker disambiguation explaining that a shared script is always running within a worker
  2. <script type="py" worker="utility"></script> where, because it's named, that worker can be reached through pyscript.workers by name and it's made available to any mpy or py script on the main ... this is effectively a different kind of worker because it exposes directly its exports as opposite of using sync convention ... so I'd rather avoid the name worker at all and think about previous idea
  3. something else, as long as it's clear that:
    • it's not a fully capable PyWorker primitive with all the features we currently have
    • it doesn't need sync at all, it's rather handled as a module
    • it always runs in a worker (but this could be actually not necessarily the case, although I struggle to see use cases for not running in workers but maybe it's more elegant than trashing cross interpreters utilities on the window / globalThis context
  4. because of the previous point, maybe we can combine all worlds via <script type="py" shared="utility" worker></script> so that:
    • it's clear that's not just a worker, it's a new "primitive" and we don't need to reflect it imperatively or to break current PyWorker which already works and serves well its users out there
    • maybe instead of shared we could use a different attribute ... but then the boilerplate would be awkward, like: <script type="py" id="utility" shared worker></script> ... I'd be OK with it, but also I don't think that worker attribute should be there

As summary

If we are OK to have shared workers that don't need to break current state of affairs, mislead users about what's a worker and what's not really a worker but a module from a worker, and so on, I would be more than 100% happy to implement that as much as I can (I still need to investigate the ability to automatically expose all things).

Last, but not least, the code will export just what any Python module exports, through this new way of bootstrapping worker utilities, so that any discussion around __ALL__ VS sync VS ... makes little sense to me ... it's going to be a Python module and it will export, without surprising anyone, what any Python module exports accordingly to that code.

I hope this thread helps better understanding where we are now and why I have been pushing back on killing all good things used to date around PyWorker, but also I hope we can reach consensus around alternatives that don't need to break once again everything around PyScript workers.

Last, but not least, I agree that sync is an ugly name but it makes perfect sense from a main exposing to workers perspective, because any main utility, even if async, is going to be consumed synchronously from the worker as that's how everything works in there already, including DOM manipulation and whatnot.

P.S. we have a meetup talk on Wednesday and I don't think I can work much on this topic until then because we have a release coming out and I need to improve documentation and this post will help me improving that Worker page too. I hope that's OK to you.

You must be logged in to vote
0 replies
Comment options

to whom it might concern, we're moving forward after latest discussions and ideas and this is the related MR and comment on how it works, what it does or how, and what's still needed to be refined/discussed: #2104 (comment)

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /