Spark Cassandra Connector Java API append/remove data in a collection fail

问题

I am trying to append values to a column of type set, via the JAVA API.

It seems that the connector disregards the type of CollectionBehavior I am setting, and always overrides the previous collection.

Even when I use CollectionRemove, the value to be removed is added to the collection.

I am following the example as shown in:

https://datastax-oss.atlassian.net/browse/SPARKC-340?page=com.atlassian.jira.plugin.system.issuetabpanels%3Achangehistory-tabpanel

I am using:

spark-core_2.11 2.2.0
spark-cassandra-connector_2.11 2.0.5
Cassandra 2.1.17

Could it be that this feature is no supported on those versions?

Here is the implementation code:

// CASSANDRA TABLE
CREATE TABLE test.profile (
    id text PRIMARY KEY,
    dates set<bigint>,
)

// ENTITY
public class ProfileRow {
    public static final Map<String, String> namesMap;
    static {
        namesMap = new HashMap<>();
        namesMap.put("id", "id");
        namesMap.put("dates", "dates");
    }
    private String id;
    private Set<Long> dates;
    public ProfileRow() {}
    public String getId() {
        return id;
    }
    public void setId(String id) {
        this.id = id;
    }
    public Set<Long> getDates() {
        return dates;
    }
    public void setDates(Set<Long> dates) {
        this.dates = dates;
    }
}


public void execute(JavaSparkContext context) {
    List<ProfileRow> elements = new LinkedList<>();
    ProfileRow profile = new ProfileRow();
    profile.setId("fGxTObQIXM");
    Set<Long> dates = new HashSet<>();
    dates.add(1l);
    profile.setDates(dates);
    elements.add(profile);
    JavaRDD<ProfileRow> rdd = context.parallelize(elements);

    RDDAndDStreamCommonJavaFunctions<T>.WriterBuilder wb = javaFunctions(rdd)
        .writerBuilder("test", "profile", mapToRow(ProfileRow.class, ProfileRow.namesMap));
    CollectionColumnName appendColumn = new CollectionColumnName("dates", Option.empty(), CollectionAppend$.MODULE$);
    scala.collection.Seq<ColumnRef> columnRefSeq = JavaApiHelper.toScalaSeq(Arrays.asList(appendColumn));
    SomeColumns columnSelector = SomeColumns$.MODULE$.apply(columnRefSeq);

    wb.withColumnSelector(columnSelector);
    wb.saveToCassandra();
}

Thanks,

Shai

回答1:

I found the answer. There are 2 things I had to change:

Add the primary key column to the column selector.
WriterBuilder.withColumnSelector() generates a new instance of WriterBuilder, so I had to store the new instance.

RDDAndDStreamCommonJavaFunctions<T>.WriterBuilder wb = javaFunctions(rdd)
    .writerBuilder("test", "profile", mapToRow(ProfileRow.class, ProfileRow.namesMap));
ColumnName pkColumn = new ColumnName("id", Option.empty())
CollectionColumnName appendColumn = new CollectionColumnName("dates", Option.empty(), CollectionAppend$.MODULE$);
scala.collection.Seq<ColumnRef> columnRefSeq = JavaApiHelper.toScalaSeq(Arrays.asList(pkColumn, appendColumn));
SomeColumns columnSelector = SomeColumns$.MODULE$.apply(columnRefSeq);

wb = wb.withColumnSelector(columnSelector);
wb.saveToCassandra();

来源：https://stackoverflow.com/questions/50598458/spark-cassandra-connector-java-api-append-remove-data-in-a-collection-fail

标签

apache-spark

collections

cassandra

spark-cassandra-connector

java-api