Cassandra gives no data even if data exists

问题

I have a keyspace with replication factor of 3. I am inserting data into Cassandra (4 node cluster with single data center) with write consistency level one. After completion of insertions I am reading data with consistency level quorum (2). But I am not getting data sometimes even if data exists, after some time I am getting data with same query. I don't know why Cassandra behaves like this.

My column family schema

CREATE TABLE input_data_profile.input_log_profile_1 (
    cid text,
    ctdon bigint,
    ctdat bigint,
    email text,
    addrs set<frozen<udt_addrs>>,
    asset set<frozen<udt_asset>>,
    cntno set<frozen<udt_cntno>>,
    dob frozen<udt_date>,
    dvc set<frozen<udt_dvc>>,
    eaka set<text>,
    edmn text,
    educ set<frozen<udt_educ>>,
    gen tinyint,
    hobby set<text>,
    income set<frozen<udt_income>>,
    interest set<text>,
    lang set<frozen<udt_lang>>,
    levnt set<frozen<udt_levnt>>,
    like map<text, frozen<set<text>>>,
    loc set<frozen<udt_loc>>,
    mapp set<text>,
    name frozen<udt_name>,
    params map<text, frozen<set<text>>>,
    prfsn set<frozen<udt_prfsn>>,
    rel set<frozen<udt_rel>>,
    rel_s tinyint,
    skills_prfsn set<frozen<udt_skill_prfsn>>,
    snw set<frozen<udt_snw>>,
    sport set<text>,
    status tinyint,
    z_addrs tinyint,
    z_asset tinyint,
    z_cntno tinyint,
    z_dob tinyint,
    z_dvc tinyint,
    z_eaka tinyint,
    z_educ tinyint,
    z_email tinyint,
    z_gen tinyint,
    z_hobby tinyint,
    z_income tinyint,
    z_interest tinyint,
    z_lang tinyint,
    z_levnt tinyint,
    z_like tinyint,
    z_loc tinyint,
    z_mapp tinyint,
    z_name tinyint,
    z_params tinyint,
    z_prfsn tinyint,
    z_rel tinyint,
    z_rel_s tinyint,
    z_skills_prfsn tinyint,
    z_snw tinyint,
    z_sport tinyint,
    PRIMARY KEY (cid, ctdon, ctdat, email)
) WITH CLUSTERING ORDER BY (ctdon ASC, ctdat ASC, email ASC)
CREATE INDEX input_log_profile_1_z_snw_idx ON input_data_profile.input_log_profile_1 (z_snw);
CREATE INDEX input_log_profile_1_z_prfsn_idx ON input_data_profile.input_log_profile_1 (z_prfsn);
CREATE INDEX input_log_profile_1_z_hobby_idx ON input_data_profile.input_log_profile_1 (z_hobby);
CREATE INDEX input_log_profile_1_z_rel_idx ON input_data_profile.input_log_profile_1 (z_rel);
CREATE INDEX input_log_profile_1_z_gen_idx ON input_data_profile.input_log_profile_1 (z_gen);
CREATE INDEX input_log_profile_1_z_mapp_idx ON input_data_profile.input_log_profile_1 (z_mapp);
CREATE INDEX input_log_profile_1_z_dvc_idx ON input_data_profile.input_log_profile_1 (z_dvc);
CREATE INDEX input_log_profile_1_z_skills_prfsn_idx ON input_data_profile.input_log_profile_1 (z_skills_prfsn);
CREATE INDEX input_log_profile_1_z_eaka_idx ON input_data_profile.input_log_profile_1 (z_eaka);
CREATE INDEX input_log_profile_1_z_name_idx ON input_data_profile.input_log_profile_1 (z_name);
CREATE INDEX input_log_profile_1_z_cntno_idx ON input_data_profile.input_log_profile_1 (z_cntno);
CREATE INDEX input_log_profile_1_z_educ_idx ON input_data_profile.input_log_profile_1 (z_educ);
CREATE INDEX input_log_profile_1_z_loc_idx ON input_data_profile.input_log_profile_1 (z_loc);
CREATE INDEX input_log_profile_1_z_email_idx ON input_data_profile.input_log_profile_1 (z_email);
CREATE INDEX input_log_profile_1_z_interest_idx ON input_data_profile.input_log_profile_1 (z_interest);
CREATE INDEX input_log_profile_1_z_asset_idx ON input_data_profile.input_log_profile_1 (z_asset);
CREATE INDEX input_log_profile_1_z_like_idx ON input_data_profile.input_log_profile_1 (z_like);
CREATE INDEX input_log_profile_1_z_rel_s_idx ON input_data_profile.input_log_profile_1 (z_rel_s);
CREATE INDEX input_log_profile_1_z_lang_idx ON input_data_profile.input_log_profile_1 (z_lang);
CREATE INDEX input_log_profile_1_z_addrs_idx ON input_data_profile.input_log_profile_1 (z_addrs);
CREATE INDEX input_log_profile_1_z_dob_idx ON input_data_profile.input_log_profile_1 (z_dob);
CREATE INDEX input_log_profile_1_z_income_idx ON input_data_profile.input_log_profile_1 (z_income);
CREATE INDEX input_log_profile_1_z_sport_idx ON input_data_profile.input_log_profile_1 (z_sport);
CREATE INDEX input_log_profile_1_z_params_idx ON input_data_profile.input_log_profile_1 (z_params);

I need to process fields wise so I indexed the every field status. I want to improve the read and write tps. Suggest me some modifications in schema.

回答1:

If I understand you correctly then you are really asking two questions here:

First, you are writing data with a CL=1 and reading it with a CL=Quorum and wondering why you are not always retrieving the data you have written but then can retrieve it later. If this is correct then this is the expected behavior of Cassandra. When writing with a CL=1 then the first of the 3 replicas to respond will return a successful write to the client. If you then tried to read using Quorum prior to the data being written to the other replicas then its possible you could get nothing (or stale) data returned to you. This is the eventual consistency part of Cassandra. If you are trying to read the data immediately after a successful write then this is likely the cause of your problems as "Read after Write" is an anti-pattern in Cassandra and most other distributed systems.

Second, in your data schema you are using Indexes incorrectly. If you are using indexes to allow you to query on those fields then this is an anti-pattern especially with the large number that you have. Indexes in Cassandra are an expensive operation that should only be used in rare cases where the column being indexed has low-cardinality. See this https://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_when_use_index_c.html

If you need to query by that large number of columns then you need to reevaluate your data model as Cassandra is optimized on a table-per-query methodology where you query on the fields in the primary key alone. This requires you to denormalize your data into multiple different tables in order to build any application of reasonable complexity. This is one of the tradeoffs you make when choosing the performance, high availability and scalability that Cassandra provides. If you truly need the ability to perform ad-hoc queries on your data I suggest you look at a different datastore.

来源：https://stackoverflow.com/questions/47650317/cassandra-gives-no-data-even-if-data-exists

标签

cassandra

cqlsh

cassandra-3.0