Graphql + Dgraph how to batch import json data?

问题

I just started a trivial graphql schema:

type Product {
    productID: ID!
    name: String @search(by: [term])
    reviews: [Review] @hasInverse(field: about)
}

type Review {
    id: ID!
    about: Product! @hasInverse(field: reviews)
    by: Customer! @hasInverse(field: reviews)
    comment: String @search(by: [fulltext])
    rating: Int @search
}

type Customer {
    custID: ID!
    name: String @search(by: [hash, regexp])
    reviews: [Review] @hasInverse(field: by)
}

Now I want to populate the DB with millions of json entries without calling the graphql mutation (too slow). For instance I have a folder full of several json files (customers and products) of the following shape.

Example of a json customer file:

{
id: "deadbeef",
name: "Bill Gates",
reviews: [
   {
      id:"1234",
      comment: "nice product"
      rating: 5,
      productId: "5678"
   }
]
}

Example of a json product file:

{
id: "5678",
name: "Bluetooth headset",
}

To what I understood, to defined edges between nodes, I first have to overload each object with an uid

The customer would become:

{
id: "deadbeef",
uid: "_:deadbeef",
...
reviews: [
   {
      id:"1234",
      uid:"_:1234",
      productId: {uid: "_:5678"}
   }
]
}

And the product

{
id: "5678",
uid: "_:5678"
...
}

Then we could batch import them (this is pure speculation, I never tried this). While this should import the entries, I would like to know how the DB would associate those entries with a type, because there is no clue yet on the data we want to insert. Is there a property like __typename I could add to each of my entries to type them?

[edit] I've found 2 possible properties class and dgraph.type still wondering which one and how I should use them

回答1:

The graphql schema above will generate the following predicates:

Customer.name
Customer.reviews
Product.name
Product.reviews
Review.about
Review.by
Review.comment
Review.rating
Schema.date
Schema.schema

i.e. Type.property To batch import values, there is no need to specify the type, just use the right property name.

Here is a working sample:

    const product = {
        "dgraph.type":"Product",
        "uid": "_:5678",
        "Product.name": "Bluetooth headset"
    };

    const customer = {
        "uid": "_:deadbeef",
        "dgraph.type":"Customer",
        "Customer.name": "Bill Gates",
        "Customer.reviews": [
            {                    
                "uid": "_:1234",
                "dgraph.type":"Review",
                "Review.comment": "nice product",
                "Review.rating": 5,
                "Review.by": {"uid": "_:deadbeef"},
                "Review.about": {"uid": "_:5678"}
            }
        ]
    };

    // Run mutation.
    const mu = new Mutation();
    mu.setSetJson({set: [product, customer]});

If you want to import blocks of thousands of entries, you need find a way to keep the blank ids across the transactions. To achieve this, I suggest to use a class responsible to keep the maps among the blocks imports. Here is my POC

import {DgraphClient, DgraphClientStub, Mutation} from "dgraph-js";
import * as jspb from 'google-protobuf';

type uidMap = jspb.Map<string, string>;

class UidMapper {

    constructor(private uidMap: uidMap = UidMapper.emptyMap()) {
    }

    private static emptyMap(): uidMap {
        return new jspb.Map<string, string>([]);
    }

    public uid(uid: string): string {
        return this.uidMap.get(uid) || `_:${uid}`;
    }

    public addMap(anotherMap: uidMap): void {
        anotherMap.forEach((value, key) => {
            this.uidMap.set(key, value);
        });
    }
}

class Importer {
    public async importTest(): Promise<void> {
        try {
            const clientStub = new DgraphClientStub(
                "localhost:9080",
                grpc.credentials.createInsecure(),
            );
            const dgraphClient: DgraphClient = new DgraphClient(clientStub);

            await this.createData(dgraphClient);

            clientStub.close();
        } catch (error) {
            console.log(error);
        }
    }

    private async createData(dgraphClient: DgraphClient): Promise<void> {
        const mapper = new UidMapper();

        const product = {
        "dgraph.type":"Product",
        "uid": mapper.uid("5678"),
        "Product.name": "Bluetooth headset"
        };

        const customer = ...;
        const addMoreInfo = ...;

        await this.setJsonData(dgraphClient, mapper, [product, customer]);
        await this.setJsonData(dgraphClient, mapper, [addMoreInfo]);
    }

    private async setJsonData(dgraphClient: DgraphClient, mapper: UidMapper, data: any[]) {
        // Create a new transaction.
        const txn = dgraphClient.newTxn();
        try {
            // Run mutation.
            const mu = new Mutation();

            mu.setSetJson({set: data});
            let response = await txn.mutate(mu);
            // Commit transaction.
            mapper.addMap(response.getUidsMap());
            await txn.commit();

        } finally {
            // Clean up. Calling this after txn.commit() is a no-op and hence safe.
            await txn.discard();
        }
    }
}

回答2:

Some points that need to be taken into account:

1 - GraphQL and GraphQL+- are completely different things.

2 - Dgraph has a type system that needs to be followed. https://docs.dgraph.io/query-language/#type-system

3 - Mutation operations on clients are not interconnected, except for Upsert operations. https://docs.dgraph.io/mutations/#upsert-block That is, setting blank_node in a mutation operation will not transfer the value assigned to it for the next mutation. You need to save the assigned UID in a variable and then use it in the next mutation.

More about mutations and blank_node https://tour.dgraph.io/master/intro/5/

4 - If you need to use the GraphQL layer, you need to read all posts and recommendations for this feature. And understand that Dgraph works one way and the GraphQL layer another way.

Continuing.

If you need to submit multiple batches in JSONs. I recommend that you use LiveLoad https://docs.dgraph.io/deploy/#live-loader. And use the -x flag. With it you can keep the mapping of UIDs for each blank node created. That is, if all entities you have have a Blank_node. It will be mapped and assigned a UID which will then be reused for every new batch via liveload.

-x, --xidmap string            Directory to store xid to uid mapping

BTW: I don't know the concept of "class" in Dgraph.

I hope it help.

Cheers.

来源：https://stackoverflow.com/questions/58688652/graphql-dgraph-how-to-batch-import-json-data

标签

graphql

dgraph