Randomcoder
2016-08-06 10:48:23 UTC
Hello,
I've been working on a small Twisted program.
The program makes HTTP requests to a large number of feeds.
Twisted is used to speed up the entire process.
After the feeds are fetched, they're parsed. Finally they should be
written to a database (to simplify the code, that part is left out).
Feeds are fetched in parallel using gatherResults, and a batch is
built. Then all batches are again gathered into a set of batches,
a DeferredList is built out of those. A semaphore controls both the
batch-level list of deferreds, and a semaphore controls the entire batch
list deferred.
Currently, the program works ok on 100-150 feeds, and BATCH_SIZE between
5 and 20.
However, I notice the program starts to hang for a long time, when the
number of feeds goes over 150-200.
To be more precise, at the end of running the program, messages
like these are printed, but the program seems to not be very active:
Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x7f0b7d5f3908>
It seems like this is the cleanup phase.
I've read what I could find on the topic. I wasn't able to make progress
on it, so I'm posting to the mailing list to ask if someone has encountered this
before. Maybe it's a common pitfall or issue that other people have also
bumped into.
Thanks
I've been working on a small Twisted program.
The program makes HTTP requests to a large number of feeds.
Twisted is used to speed up the entire process.
After the feeds are fetched, they're parsed. Finally they should be
written to a database (to simplify the code, that part is left out).
Feeds are fetched in parallel using gatherResults, and a batch is
built. Then all batches are again gathered into a set of batches,
a DeferredList is built out of those. A semaphore controls both the
batch-level list of deferreds, and a semaphore controls the entire batch
list deferred.
Currently, the program works ok on 100-150 feeds, and BATCH_SIZE between
5 and 20.
However, I notice the program starts to hang for a long time, when the
number of feeds goes over 150-200.
To be more precise, at the end of running the program, messages
like these are printed, but the program seems to not be very active:
Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x7f0b7d5f3908>
It seems like this is the cleanup phase.
I've read what I could find on the topic. I wasn't able to make progress
on it, so I'm posting to the mailing list to ask if someone has encountered this
before. Maybe it's a common pitfall or issue that other people have also
bumped into.
Thanks