What is the correct way of reading from a TCP socket in C/C++?

前端 未结 8 1411
后悔当初
后悔当初 2020-11-28 04:33

Here\'s my code:

// Not all headers are relevant to the code snippet.
#include 
#include 
#include 
#in         


        
相关标签:
8条回答
  • 2020-11-28 04:41

    This is an article that I always refer to when working with sockets..

    THE WORLD OF SELECT()

    It will show you how to reliably use 'select()' and contains some other useful links at the bottom for further info on sockets.

    0 讨论(0)
  • 2020-11-28 04:41

    For any non-trivial application (I.E. the application must receive and handle different kinds of messages with different lengths), the solution to your particular problem isn't necessarily just a programming solution - it's a convention, I.E. a protocol.

    In order to determine how many bytes you should pass to your read call, you should establish a common prefix, or header, that your application receives. That way, when a socket first has reads available, you can make decisions about what to expect.

    A binary example might look like this:

    #include <stdint.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <unistd.h>
    #include <arpa/inet.h>
    
    enum MessageType {
        MESSAGE_FOO,
        MESSAGE_BAR,
    };
    
    struct MessageHeader {
        uint32_t type;
        uint32_t length;
    };
    
    /**
     * Attempts to continue reading a `socket` until `bytes` number
     * of bytes are read. Returns truthy on success, falsy on failure.
     *
     * Similar to @grieve's ReadXBytes.
     */
    int readExpected(int socket, void *destination, size_t bytes)
    {
        /*
        * Can't increment a void pointer, as incrementing
        * is done by the width of the pointed-to type -
        * and void doesn't have a width
        *
        * You can in GCC but it's not very portable
        */
        char *destinationBytes = destination;
        while (bytes) {
            ssize_t readBytes = read(socket, destinationBytes, bytes);
            if (readBytes < 1)
                return 0;
            destinationBytes += readBytes;
            bytes -= readBytes;
        }
        return 1;
    }
    
    int main(int argc, char **argv)
    {
        int selectedFd;
    
        // use `select` or `poll` to wait on sockets
        // received a message on `selectedFd`, start reading
    
        char *fooMessage;
        struct {
            uint32_t a;
            uint32_t b;
        } barMessage;
    
        struct MessageHeader received;
        if (!readExpected (selectedFd, &received, sizeof(received))) {
            // handle error
        }
        // handle network/host byte order differences maybe
        received.type = ntohl(received.type);
        received.length = ntohl(received.length);
    
        switch (received.type) {
            case MESSAGE_FOO:
                // "foo" sends an ASCII string or something
                fooMessage = calloc(received.length + 1, 1);
                if (readExpected (selectedFd, fooMessage, received.length))
                    puts(fooMessage);
                free(fooMessage);
                break;
            case MESSAGE_BAR:
                // "bar" sends a message of a fixed size
                if (readExpected (selectedFd, &barMessage, sizeof(barMessage))) {
                    barMessage.a = ntohl(barMessage.a);
                    barMessage.b = ntohl(barMessage.b);
                    printf("a + b = %d\n", barMessage.a + barMessage.b);
                }
                break;
            default:
                puts("Malformed type received");
                // kick the client out probably
        }
    }
    

    You can likely already see one disadvantage of using a binary format - for each attribute greater than a char you read, you will have to ensure its byte order is correct using the ntohl or ntohs functions.

    An alternative is to use byte-encoded messages, such as simple ASCII or UTF-8 strings, which avoid byte-order issues entirely but require extra effort to parse and validate.

    There are two final considerations for network data in C.

    The first is that some C types do not have fixed widths. For example, the humble int is defined as the word size of the processor, so 32 bit processors will produce 32 bit ints, while 64 bit processors will produces 64 bit ints. Good, portable code should have network data use fixed-width types, like those defined in stdint.h.

    The second is struct padding. A struct with different-widthed members will add data in between some members to maintain memory alignment, making the struct faster to use in the program but sometimes producing confusing results.

    #include <stdio.h>
    #include <stdint.h>
    
    int main()
    {
        struct A {
            char a;
            uint32_t b;
        } A;
    
        printf("sizeof(A): %ld\n", sizeof(A));
    }
    

    In this example, its actual width won't be 1 char + 4 uint32_t = 5 bytes, it'll be 8:

    mharrison@mharrison-KATANA:~$ gcc -o padding padding.c
    mharrison@mharrison-KATANA:~$ ./padding 
    sizeof(A): 8
    

    This is because 3 bytes are added after char a to make sure uint32_t b is memory-aligned.

    So if you write a struct A, then attempt to read a char and a uint32_t on the other side, you'll get char a, and a uint32_t where the first three bytes are garbage and the last byte is the first byte of the actual integer you wrote.

    Either document your data format explicitly as C struct types or, better yet, document any padding bytes they might contain.

    0 讨论(0)
  • 2020-11-28 04:44

    If you actually create the buffer as per dirks suggestion, then:

      int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE);
    

    may completely fill the buffer, possibly overwriting the terminating zero character which you depend on when extracting to a stringstream. You need:

      int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE - 1 );
    
    0 讨论(0)
  • 2020-11-28 04:44

    Just to add to things from several of the posts above:

    read() -- at least on my system -- returns ssize_t. This is like size_t, except is signed. On my system, it's a long, not an int. You might get compiler warnings if you use int, depending on your system, your compiler, and what warnings you have turned on.

    0 讨论(0)
  • 2020-11-28 04:49

    Without knowing your full application it is hard to say what the best way to approach the problem is, but a common technique is to use a header which starts with a fixed length field, which denotes the length of the rest of your message.

    Assume that your header consist only of a 4 byte integer which denotes the length of the rest of your message. Then simply do the following.

    // This assumes buffer is at least x bytes long,
    // and that the socket is blocking.
    void ReadXBytes(int socket, unsigned int x, void* buffer)
    {
        int bytesRead = 0;
        int result;
        while (bytesRead < x)
        {
            result = read(socket, buffer + bytesRead, x - bytesRead);
            if (result < 1 )
            {
                // Throw your error.
            }
    
            bytesRead += result;
        }
    }
    

    Then later in the code

    unsigned int length = 0;
    char* buffer = 0;
    // we assume that sizeof(length) will return 4 here.
    ReadXBytes(socketFileDescriptor, sizeof(length), (void*)(&length));
    buffer = new char[length];
    ReadXBytes(socketFileDescriptor, length, (void*)buffer);
    
    // Then process the data as needed.
    
    delete [] buffer;
    

    This makes a few assumptions:

    • ints are the same size on the sender and receiver.
    • Endianess is the same on both the sender and receiver.
    • You have control of the protocol on both sides
    • When you send a message you can calculate the length up front.

    Since it is common to want to explicitly know the size of the integer you are sending across the network define them in a header file and use them explicitly such as:

    // These typedefs will vary across different platforms
    // such as linux, win32, OS/X etc, but the idea
    // is that a Int8 is always 8 bits, and a UInt32 is always
    // 32 bits regardless of the platform you are on.
    // These vary from compiler to compiler, so you have to 
    // look them up in the compiler documentation.
    typedef char Int8;
    typedef short int Int16;
    typedef int Int32;
    
    typedef unsigned char UInt8;
    typedef unsigned short int UInt16;
    typedef unsigned int UInt32;
    

    This would change the above to:

    UInt32 length = 0;
    char* buffer = 0;
    
    ReadXBytes(socketFileDescriptor, sizeof(length), (void*)(&length));
    buffer = new char[length];
    ReadXBytes(socketFileDescriptor, length, (void*)buffer);
    
    // process
    
    delete [] buffer;
    

    I hope this helps.

    0 讨论(0)
  • 2020-11-28 04:51

    1) Others (especially dirkgently) have noted that buffer needs to be allocated some memory space. For smallish values of N (say, N <= 4096), you can also allocate it on the stack:

    #define BUFFER_SIZE 4096
    char buffer[BUFFER_SIZE]
    

    This saves you the worry of ensuring that you delete[] the buffer should an exception be thrown.

    But remember that stacks are finite in size (so are heaps, but stacks are finiter), so you don't want to put too much there.

    2) On a -1 return code, you should not simply return immediately (throwing an exception immediately is even more sketchy.) There are certain normal conditions that you need to handle, if your code is to be anything more than a short homework assignment. For example, EAGAIN may be returned in errno if no data is currently available on a non-blocking socket. Have a look at the man page for read(2).

    0 讨论(0)
提交回复
热议问题