Error using SpannerIO in apache beam

前端 未结 2 853
感动是毒
感动是毒 2021-01-14 17:49

This question is a follow-up to this one. I am trying to use apache beam to read data from a google spanner table (and then do some data processing). I wrote the following m

相关标签:
2条回答
  • 2021-01-14 18:44

    This issue is most likely caused by a dependency compatibility problem described here: BEAM-2837. Here's a quick workaround described in one of the comments in the JIRA issue:

    <dependency>
        <groupId>com.google.api.grpc</groupId>
        <artifactId>grpc-google-common-protos</artifactId>
        <version>0.1.9</version>
    </dependency>
    
    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
        <version>${beam.version}</version>
        <exclusions>
            <exclusion>
                <groupId>com.google.api.grpc</groupId>
                <artifactId>grpc-google-common-protos</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    

    Explicitly define the required com.google.api.grpc dependency and exclude the version from org.apache.beam.

    0 讨论(0)
  • 2021-01-14 18:44

    You need to specify the ProjectID:

        SpannerIO.read()
                .withProjectId("my_project")
                .withInstanceId("my_instance")
                .withDatabaseId("my_db")
    

    And you need to set the credentials for your Spanner project. As the API of SpannerIO does not allow you to set any custom credentials, you must set Global Application Credentials using the environment variable GOOGLE_APPLICATION_CREDENTIALS.

    You could also read (and write) to Cloud Spanner using JDBC. Reading is done like this:

            PCollection<KV<String, Long>> words = p2.apply(JdbcIO.<KV<String, Long>> read()
                .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create("nl.topicus.jdbc.CloudSpannerDriver",
                        "jdbc:cloudspanner://localhost;Project=my-project-id;Instance=instance-id;Database=database;PvtKeyPath=C:\\Users\\MyUserName\\Documents\\CloudSpannerKeys\\cloudspanner-key.json"))
                .withQuery("SELECT t.table_name FROM information_schema.tables AS t").withCoder(KvCoder.of(StringUtf8Coder.of(), BigEndianLongCoder.of()))
                .withRowMapper(new JdbcIO.RowMapper<KV<String, Long>>()
                {
                    private static final long serialVersionUID = 1L;
    
                    @Override
                    public KV<String, Long> mapRow(ResultSet resultSet) throws Exception
                    {
                        return KV.of(resultSet.getString(1), resultSet.getLong(2));
                    }
                }));
    

    This method also allows you to use custom credentials by setting the PvtKeyPath. You can also write to Google Cloud Spanner using JDBC. Have a look here for an example: http://www.googlecloudspanner.com/2017/10/google-cloud-spanner-with-apache-beam.html

    0 讨论(0)
提交回复
热议问题