问题
I am trying to append values to a column of type set, via the JAVA API.
It seems that the connector disregards the type of CollectionBehavior I am setting, and always overrides the previous collection.
Even when I use CollectionRemove, the value to be removed is added to the collection.
I am following the example as shown in:
https://datastax-oss.atlassian.net/browse/SPARKC-340?page=com.atlassian.jira.plugin.system.issuetabpanels%3Achangehistory-tabpanel
I am using:
- spark-core_2.11 2.2.0
- spark-cassandra-connector_2.11 2.0.5
- Cassandra 2.1.17
Could it be that this feature is no supported on those versions?
Here is the implementation code:
// CASSANDRA TABLE
CREATE TABLE test.profile (
id text PRIMARY KEY,
dates set<bigint>,
)
// ENTITY
public class ProfileRow {
public static final Map<String, String> namesMap;
static {
namesMap = new HashMap<>();
namesMap.put("id", "id");
namesMap.put("dates", "dates");
}
private String id;
private Set<Long> dates;
public ProfileRow() {}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public Set<Long> getDates() {
return dates;
}
public void setDates(Set<Long> dates) {
this.dates = dates;
}
}
public void execute(JavaSparkContext context) {
List<ProfileRow> elements = new LinkedList<>();
ProfileRow profile = new ProfileRow();
profile.setId("fGxTObQIXM");
Set<Long> dates = new HashSet<>();
dates.add(1l);
profile.setDates(dates);
elements.add(profile);
JavaRDD<ProfileRow> rdd = context.parallelize(elements);
RDDAndDStreamCommonJavaFunctions<T>.WriterBuilder wb = javaFunctions(rdd)
.writerBuilder("test", "profile", mapToRow(ProfileRow.class, ProfileRow.namesMap));
CollectionColumnName appendColumn = new CollectionColumnName("dates", Option.empty(), CollectionAppend$.MODULE$);
scala.collection.Seq<ColumnRef> columnRefSeq = JavaApiHelper.toScalaSeq(Arrays.asList(appendColumn));
SomeColumns columnSelector = SomeColumns$.MODULE$.apply(columnRefSeq);
wb.withColumnSelector(columnSelector);
wb.saveToCassandra();
}
Thanks,
Shai
回答1:
I found the answer. There are 2 things I had to change:
- Add the primary key column to the column selector.
- WriterBuilder.withColumnSelector() generates a new instance of WriterBuilder, so I had to store the new instance.
:
RDDAndDStreamCommonJavaFunctions<T>.WriterBuilder wb = javaFunctions(rdd)
.writerBuilder("test", "profile", mapToRow(ProfileRow.class, ProfileRow.namesMap));
ColumnName pkColumn = new ColumnName("id", Option.empty())
CollectionColumnName appendColumn = new CollectionColumnName("dates", Option.empty(), CollectionAppend$.MODULE$);
scala.collection.Seq<ColumnRef> columnRefSeq = JavaApiHelper.toScalaSeq(Arrays.asList(pkColumn, appendColumn));
SomeColumns columnSelector = SomeColumns$.MODULE$.apply(columnRefSeq);
wb = wb.withColumnSelector(columnSelector);
wb.saveToCassandra();
来源:https://stackoverflow.com/questions/50598458/spark-cassandra-connector-java-api-append-remove-data-in-a-collection-fail