Haskell Thrift library 300x slower than C++ in performance test

前端 未结 5 1640
我在风中等你
我在风中等你 2021-01-30 05:04

I\'m building an application which contains two components - server written in Haskell, and client written in Qt (C++). I\'m using thrift to communicate them, and I wonder why i

5条回答
  •  慢半拍i
    慢半拍i (楼主)
    2021-01-30 05:48

    This is fairly consistent with what user13251 says: The haskell implementation of thrift implies a large number of small reads.

    EG: In Thirft.Protocol.Binary

    readI32 p = do
        bs <- tReadAll (getTransport p) 4
        return $ Data.Binary.decode bs
    

    Lets ignore the other odd bits and just focus on that for now. This says: "to read a 32bit int: read 4 bytes from the transport then decode this lazy bytestring."

    The transport method reads exactly 4 bytes using the lazy bytestring hGet. The hGet will do the following: allocate a buffer of 4 bytes then use hGetBuf to fill this buffer. hGetBuf might be using an internal buffer, depends on how the Handle was initialized.

    So there might be some buffering. Even so, this means Thrift for haskell is performing the read/decode cycle for each integer individually. Allocating a small memory buffer each time. Ouch!

    I don't really see a way to fix this without the Thrift library being modified to perform larger bytestring reads.

    Then there are the other oddities in the thrift implementation: Using a classes for a structure of methods. While they look similar and can act like a structure of methods and are even implemented as a structure of methods sometimes: They should not be treated as such. See the "Existential Typeclass" antipattern:

    • http://lukepalmer.wordpress.com/2010/01/24/haskell-antipattern-existential-typeclass/

    One odd part of the test implementation:

    • generating an array of Ints only to immediately change them to Int32s only to immediately pack into a Vector of Int32s. Generating the vector immediately would be sufficient and faster.

    Though, I suspect, this is not the primary source of performance issues.

提交回复
热议问题