I want my Spark application to read a table from DynamoDB, do stuff, then write the result in DynamoDB.
Right now, I can read
This is somewhat simpler working example.
For Writing to DynamoDB from Kinesis Stream for Example using Hadoop RDD:-
https://github.com/kali786516/Spark2StructuredStreaming/blob/master/src/main/scala/com/dataframe/part11/kinesis/consumer/KinesisSaveAsHadoopDataSet/TransactionConsumerDstreamToDynamoDBHadoopDataSet.scala
For reading from DynamoDB using Hadoop RDD and using spark SQL without regex.
val ddbConf = new JobConf(spark.sparkContext.hadoopConfiguration)
//ddbConf.set("dynamodb.output.tableName", "student")
ddbConf.set("dynamodb.input.tableName", "student")
ddbConf.set("dynamodb.throughput.write.percent", "1.5")
ddbConf.set("dynamodb.endpoint", "dynamodb.us-east-1.amazonaws.com")
ddbConf.set("dynamodb.regionid", "us-east-1")
ddbConf.set("dynamodb.servicename", "dynamodb")
ddbConf.set("dynamodb.throughput.read", "1")
ddbConf.set("dynamodb.throughput.read.percent", "1")
ddbConf.set("mapred.input.format.class", "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
ddbConf.set("mapred.output.format.class", "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
//ddbConf.set("dynamodb.awsAccessKeyId", credentials.getAWSAccessKeyId)
//ddbConf.set("dynamodb.awsSecretAccessKey", credentials.getAWSSecretKey)
val data = spark.sparkContext.hadoopRDD(ddbConf, classOf[DynamoDBInputFormat], classOf[Text], classOf[DynamoDBItemWritable])
val simple2: RDD[(String)] = data.map { case (text, dbwritable) => (dbwritable.toString)}
spark.read.json(simple2).registerTempTable("gooddata")
spark.sql("select replace(replace(split(cast(address as string),',')[0],']',''),'[','') as housenumber from gooddata").show(false)