Spark Cassandra Connector Java API append/remove data in a collection fail

懵懂的女人 提交于 2020-03-26 03:16:27

问题


I am trying to append values to a column of type set, via the JAVA API.

It seems that the connector disregards the type of CollectionBehavior I am setting, and always overrides the previous collection.

Even when I use CollectionRemove, the value to be removed is added to the collection.

I am following the example as shown in:

https://datastax-oss.atlassian.net/browse/SPARKC-340?page=com.atlassian.jira.plugin.system.issuetabpanels%3Achangehistory-tabpanel

I am using:

  • spark-core_2.11 2.2.0
  • spark-cassandra-connector_2.11 2.0.5
  • Cassandra 2.1.17

Could it be that this feature is no supported on those versions?

Here is the implementation code:

// CASSANDRA TABLE
CREATE TABLE test.profile (
    id text PRIMARY KEY,
    dates set<bigint>,
)

// ENTITY
public class ProfileRow {
    public static final Map<String, String> namesMap;
    static {
        namesMap = new HashMap<>();
        namesMap.put("id", "id");
        namesMap.put("dates", "dates");
    }
    private String id;
    private Set<Long> dates;
    public ProfileRow() {}
    public String getId() {
        return id;
    }
    public void setId(String id) {
        this.id = id;
    }
    public Set<Long> getDates() {
        return dates;
    }
    public void setDates(Set<Long> dates) {
        this.dates = dates;
    }
}


public void execute(JavaSparkContext context) {
    List<ProfileRow> elements = new LinkedList<>();
    ProfileRow profile = new ProfileRow();
    profile.setId("fGxTObQIXM");
    Set<Long> dates = new HashSet<>();
    dates.add(1l);
    profile.setDates(dates);
    elements.add(profile);
    JavaRDD<ProfileRow> rdd = context.parallelize(elements);

    RDDAndDStreamCommonJavaFunctions<T>.WriterBuilder wb = javaFunctions(rdd)
        .writerBuilder("test", "profile", mapToRow(ProfileRow.class, ProfileRow.namesMap));
    CollectionColumnName appendColumn = new CollectionColumnName("dates", Option.empty(), CollectionAppend$.MODULE$);
    scala.collection.Seq<ColumnRef> columnRefSeq = JavaApiHelper.toScalaSeq(Arrays.asList(appendColumn));
    SomeColumns columnSelector = SomeColumns$.MODULE$.apply(columnRefSeq);

    wb.withColumnSelector(columnSelector);
    wb.saveToCassandra();
}

Thanks,

Shai


回答1:


I found the answer. There are 2 things I had to change:

  1. Add the primary key column to the column selector.
  2. WriterBuilder.withColumnSelector() generates a new instance of WriterBuilder, so I had to store the new instance.

:

RDDAndDStreamCommonJavaFunctions<T>.WriterBuilder wb = javaFunctions(rdd)
    .writerBuilder("test", "profile", mapToRow(ProfileRow.class, ProfileRow.namesMap));
ColumnName pkColumn = new ColumnName("id", Option.empty())
CollectionColumnName appendColumn = new CollectionColumnName("dates", Option.empty(), CollectionAppend$.MODULE$);
scala.collection.Seq<ColumnRef> columnRefSeq = JavaApiHelper.toScalaSeq(Arrays.asList(pkColumn, appendColumn));
SomeColumns columnSelector = SomeColumns$.MODULE$.apply(columnRefSeq);

wb = wb.withColumnSelector(columnSelector);
wb.saveToCassandra();


来源:https://stackoverflow.com/questions/50598458/spark-cassandra-connector-java-api-append-remove-data-in-a-collection-fail

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!