[Twisted-Python] twisted compatibility with multiprocessing module in fork+execv mode
Flavio Grossi
2015-09-30 10:25:36 UTC
I know the multiprocessing module is not properly supported by twisted apps because of the interactions among duplicated file descriptors and signal handling, as discussed other times.

But python 3.4 introduces a new mode to use that module by spawning (i.e. fork() followed by execv()) the new processes instead of simply forking it.

So my question is how supported this is by twisted, and in general how safe it is to use subprocesses created by duplicating the parent immediately followed by the execv of a fresh interpreter.

What i'm thinking is something like this, to asynchronously process requests and delegate the cpu-bound work to some processes:

import multiprocessing

def worker(q):
while True:
work = q.get()

def main():
context = multiprocessing.get_context('spawn')
q = context.Queue()
p = context.Process(target=worker, args=(q,))
q.put_nowait(work) # when async requests are made

if __name__ == '__main__':
from twisted.internet import reactor
Glyph Lefkowitz
2015-10-03 11:07:03 UTC
Post by Flavio Grossi
I know the multiprocessing module is not properly supported by twisted apps because of the interactions among duplicated file descriptors and signal handling, as discussed other times.
To be fair, the multiprocessing module has most of these issues by itself :-). The main reason Twisted didn't work with things like multiprocessing in the past was the fact that we didn't pass SA_RESTART to the SIGCHLD handler, and that has long since been resolved; you can now fork, os.system, popen, and multiprocess more or less like you can in any other Python program.
Post by Flavio Grossi
But python 3.4 introduces a new mode to use that module by spawning (i.e. fork() followed by execv()) the new processes instead of simply forking it.
That is a definite improvement and will be far more reliable.
Post by Flavio Grossi
So my question is how supported this is by twisted, and in general how safe it is to use subprocesses created by duplicating the parent immediately followed by the execv of a fresh interpreter.
This is what Twisted would do if it were spawning a subprocess, so... safe enough.
This will probably work, but it still has the drawbacks of multiprocessing:

1. you will be serializing 'work' via pickle, which is fraught with problems,
2. you will have no way to tell when 'work' has completed, so you will easily overload all of your worker processes under heavy load.

Instead, using something like ampoule <https://pypi.python.org/pypi/ampoule <https://pypi.python.org/pypi/ampoule>> would allow you to use twisted's spawnProcess facility to send and receive data via a more reliable serialization mechanism than pickle, and get straightforward feedback (Deferreds firing with results) when work is complete.

In fairness, even doing this with ampoule is altogether too much boilerplate, and we should probably have something for quick-and-dirty multiprocessing like a 'deferToProcess' that just uses pickle and presents a similarly convenient API, spawning python interpreters as necessary behind the scenes. So I can understand why you're looking at multiprocessing; all I can tell you for now is that it is probably worth setting up all the necessary infrastructure to do this the "right way" because it will be more reliable and you will rapidly need to expand to do bi-directional communication.

Thanks for using Twisted,

