Discussion:
[Twisted-Python] reading multipart/form-data headers
Burak Arslan
2016-08-11 15:55:35 UTC
Permalink
Hello All,

You can find a sample HTTP POST request using HTTP multipart/form-data
at the end of this message.

The server that handles this request is using twisted so I end up with a
Request object. Is there a way I can extract the file name
("image008.jpg") from this stream? I'm looking at the source of
cgi.parse_multipart() and it seems to be ignored.

Best regards Burak

PS:

POST /put HTTP/1.1
Host: localhost:7111
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Type: multipart/form-data; boundary=---------------------------352471062160373366296932264
Content-Length: 382691

-----------------------------352471062160373366296932264
Content-Disposition: form-data; name="name"

a
-----------------------------352471062160373366296932264
Content-Disposition: form-data; name="version"

1
-----------------------------352471062160373366296932264
Content-Disposition: form-data; name="data"; filename="image008.jpg"
Content-Type: image/jpeg

(...)
Glyph Lefkowitz
2016-08-11 20:52:21 UTC
Permalink
Post by Burak Arslan
Hello All,
You can find a sample HTTP POST request using HTTP multipart/form-data at the end of this message.
The server that handles this request is using twisted so I end up with a Request object. Is there a way I can extract the file name ("image008.jpg") from this stream? I'm looking at the source of cgi.parse_multipart() and it seems to be ignored.
Sadly Twisted just calls into cgi.parse_multipart and so it is in fact ignored. You might be able to re-parse the request body (request.content.seek(0); request.content.read()) with something like <https://docs.python.org/2.7/library/email.mime.html#email.mime.multipart.MIMEMultipart <https://docs.python.org/2.7/library/email.mime.html#email.mime.multipart.MIMEMultipart>> or <https://github.com/mailgun/flanker <https://github.com/mailgun/flanker>> to extract more information about the MIME.

It would definitely be better for Twisted to have more robust facilities for dealing with request inputs, particularly to be able to process large uploads as a stream rather than an individual message (and such an API for form post uploads should obviously include the content disposition filename). See <https://twistedmatrix.com/trac/ticket/288 <https://twistedmatrix.com/trac/ticket/288>> for more discussion :).

Thanks for using Twisted, and sorry about this shortcoming.

-g
Adi Roiban
2016-08-12 09:28:39 UTC
Permalink
Post by Burak Arslan
Hello All,
You can find a sample HTTP POST request using HTTP multipart/form-data at
the end of this message.
The server that handles this request is using twisted so I end up with a
Request object. Is there a way I can extract the file name ("image008.jpg")
from this stream? I'm looking at the source of cgi.parse_multipart() and it
seems to be ignored.
Sadly Twisted just calls into cgi.parse_multipart and so it is in fact
ignored. You might be able to re-parse the request body
(request.content.seek(0); request.content.read()) with something like
<https://docs.python.org/2.7/library/email.mime.html#email.mime.multipart.MIMEMultipart>
or <https://github.com/mailgun/flanker> to extract more information about
the MIME.
It would definitely be better for Twisted to have more robust facilities for
dealing with request inputs, particularly to be able to process large
uploads as a stream rather than an individual message (and such an API for
form post uploads should obviously include the content disposition
filename). See <https://twistedmatrix.com/trac/ticket/288> for more
discussion :).
Thanks for using Twisted, and sorry about this shortcoming.
I have some coding which is doing a best effort to parse the request
body in a streaming mode... but it is using a fork based on the code
submitted for this ticket http://twistedmatrix.com/trac/ticket/6928

diff for the fork https://github.com/chevah/twisted/compare/6928-http-100-accept

It relies on the fact that the request will call the
resource.headerReceive() before the actual body is consumed.

form handling code using this fork
https://gist.github.com/adiroiban/7f593d6d18113aae797ad081e07f4745

It uses werkzeug.http.parse_options_header for parsing the headers

If your POST requests are just a few bytes, you can just use
request.content.seek(0); request.content.read() as suggested by Glyph
and redirect the content to the MultiPartFormData protocol

For my project I need to handle files larger than 5GB, so I ended up
with the modified request/resource

Good luck!
--
Adi Roiban
Burak Arslan
2016-08-12 09:58:28 UTC
Permalink
Hey Adi, hey Glyph,

Thanks a lot for your answers.
Post by Adi Roiban
Post by Glyph Lefkowitz
Thanks for using Twisted, and sorry about this shortcoming.
Nothing to be sorry about, twisted is made of man-years of good work.
Once you twist your point of view enough, it becomes quite elegant and
predictable. Some wars *had* to be fought uphill while integrating with
twisted's HTTP implementation but it's the way things are, I'm not
complaining.
Post by Adi Roiban
I have some coding which is doing a best effort to parse the request
body in a streaming mode... but it is using a fork based on the code
submitted for this ticket http://twistedmatrix.com/trac/ticket/6928
This is for a library I'm developing, so I can only depend on what's
already released. However, thanks for the code -- it will certainly give
me ideas.
Post by Adi Roiban
For my project I need to handle files larger than 5GB, so I ended up
with the modified request/resource
As I said, I can't make any assumptions on file size but I *think* I can
pretend that my requests fit in memory as long as I keep them in
memory-mapped files. mmap is wonderful -- It's both a file and a string!

With that assumption, this is what I came up with:

https://github.com/plq/spyne/blob/7f52ab0f11773535c6a73702b4b838b49ecdd9e6/spyne/server/twisted/http.py#L321

I'd love to hear your feedback about it. Do you think I can get away
with relying on mmap here?

Best regards,
Burak
Burak Arslan
2016-08-16 07:19:17 UTC
Permalink
Post by Burak Arslan
Do you think I can get away
with relying on mmap here?
So I'll go with the usual "don't touch it from the reactor thread" and I
guess I'll be good :)

Cheers,
Burak

Continue reading on narkive:
Search results for '[Twisted-Python] reading multipart/form-data headers' (Questions and Answers)
5
replies
Does AJAX support File Uploads?
started 2008-04-24 22:36:04 UTC
programming & design
Loading...