MPI send struct with a vector property in C++

前端 未结 3 991
南笙
南笙 2021-01-06 03:08

I want to send a struct that has a vector property.

typedef struct {
    int id;
    vector neighbors;
} Node;

I know i have to

相关标签:
3条回答
  • 2021-01-06 03:48

    Note that internally a vector<int> looks something like this:

    struct vector {
        size_t size;
        size_t alloc_size;
        int* data;
    };
    

    Thus, if you try to send the struct as puelo suggested, it won't access the actual data underlying the vector, but instead send the size fields, the data pointer and any data the follows these items in memory, which most likely will result in invalid memory access. The actual data in the vector won't be send like this.

    Generally, MPI does not work well for sending structures that contain pointers to more data. Instead you should try to think about how to send the actual underlying data itself.

    MPI communication will be easier and more efficient, if you can represent your data in a contiguous fashion.

    Your struct Node looks like you are trying to represent a node in a graph. You could for instance represent your graph data in a adjacency array format, where all neighbor ids are represented in one single large vector. Think of it like a concatenation of all neighbors vectors from your previous struct Node. For each node, you will then save the offset into the new neighbors vector.

    std::vector<int> node_ids(num_nodes);
    std::vector<int> nodes_offsets(num_nodes);
    std::vector<int> neighbors(num_edges);
    
    // neighbors for node i are accessible via:
    for (int j = node_offsets[i]; j <= node_offsets[i+1]-1; ++j) {
        int neighbor = neighbors[j];
        // ...
    }
    

    You can then send/receive this information with MPI easily:

    MPI_Send(&neighbors[0], MPI_INT, neighbors.size(), ...);
    

    When working with MPI, finding a good data layout for your data is among the most important steps while implementing your algorithms.

    0 讨论(0)
  • 2021-01-06 03:53

    I didn't like the idea of importing a library just to do this simple thing. So here is what i did:

    I thought that there is no reason to have the MPI know anything about the underlying structure of the object. So i could just manually convert it to a buffer array and since the receiver knows that is expecting a Node struct, can recreate the object on the other side. So initially i defined an MPI_Contiguous datatype and send it:

    int size = (int) ((node.second.neighbors.size() + 1) * sizeof(int *));
    
    MPI_Datatype datatype;
    MPI_Type_contiguous(size, MPI_BYTE, &datatype);
    MPI_Type_commit(&datatype);
    
    MPI_Isend(&buffer, 1, datatype, proc_rank, TAG_DATA, MPI_COMM_WORLD, &request); 
    

    This is a more general solution and worked.

    But since the struct contains an int and a vector<int>, i decided to create an int buffer with the first element as the node.id and the reset as the node.neighbors. And on the other side using MPI_Iprobe (or synchronous MPI_Probe) and MPI_Get_count i can recreate the Node struct. Here is the code:

    int *seriealizeNode(Node node) {
        //allocate buffer array
        int *s = new int[node.neighbors.size() + 1];
        //set the first element = Node.id
        s[0] = node.id;
        //set the rest elements to be the vector elements
        for (int i = 0; i < node.neighbors.size(); ++i) {
            s[i + 1] = node.neighbors[i];
        }
        return s;
    }
    
    Node deseriealizeNode(int buffer[], int size) {
        Node node;
        //get the Node.id
        node.id = buffer[0];
        //get the vector elements
        for (int i = 1; i < size; ++i) {
            node.neighbors.push_back(buffer[i]);
        }
        return node;
    }
    

    I think that there must be a more efficient/faster way for converting the Node to int[] and vice versa. I would like if someone could offer some tips.

    Then on the senders side:

    while (some_condition){
    
        ...
    
        //if there is a pending request wait for it to finish and then free the buffer
        if (request != MPI_REQUEST_NULL) {
            MPI_Wait(&request, &status);
            free(send_buffer);
        }
    
        // now send the node data
        send_buffer = seriealizeNode(node.second);
        int buffer_size = (int) (node.second.neighbors.size() + 1);
        MPI_Isend(send_buffer, buffer_size, MPI_INT, proc, TAG_DATA, MPI_COMM_WORLD, &request);
    
        ...
    }
    

    And on the receivers side:

    int count = 0;
    MPI_Iprobe(MPI_ANY_SOURCE, TAG_DATA, MPI_COMM_WORLD, &flag, &status);
    if (flag) {
        MPI_Get_count(&status, MPI_INT, &count);
        int *s = new int[count];
        MPI_Recv(s, count, MPI_INT, MPI_ANY_SOURCE, TAG_DATA, MPI_COMM_WORLD, &status);
        Node node = deseriealizeNode(s, count);
        free(s);
        //my logic
    
    }
    

    Now it works as expected.

    0 讨论(0)
  • 2021-01-06 04:12

    If you want to stay high-level and send around objects, then Boost.MPI is a good choice. With Boost.MPI you specify high level serialization for your structs.

    You cannot (correctly) statically determine the offset of the data member of a vector. It is certainly possible to piece together a type, that works. But that is also a great way to shoot yourself in the foot. You would introduce assumptions in the code (e.g. vector size does not change) that once violated will create subtle bugs. So in that case it seems cleaner and less bug-prone to me to simply send id and neighbours::data() separately in MPI_Send - instead of using MPI types that don't fit to this use case.

    0 讨论(0)
提交回复
热议问题