No rows inserted in table when import from CSV in Cassandra

问题

I am trying to import a CSV file to a Cassandra table however I am facing a problem. When inserted successfully, at least this is what Cassandra tells, I still can't see any record. Here is a little more details :

qlsh:recommendation_engine> COPY row_historical_game_outcome_data  FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|';

2 rows imported in 0.216 seconds.
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data;

 customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------

(0 rows)
cqlsh:recommendation_engine>

And this is how my data looks like

'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|123123|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0|
'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|456456|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0|

This is cassandra version apache-cassandra-2.2.0

EDITED:

CREATE TABLE row_historical_game_outcome_data (
    customer_id int,
    game_id int,
    time timestamp,
    channel text,
    currency_code text,
    game_code text,
    game_name text,
    game_type text,
    game_vendor text,
    progressive_winnings double,
    stake_amount double,
    win_amount double,
    PRIMARY KEY ((customer_id, game_id, time))
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

I've also tried the following as suggested by uri2x

But still nothing :

select * from row_historical_game_outcome_data;

 customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------

(0 rows)
cqlsh:recommendation_engine> COPY row_historical_game_outcome_data ("game_vendor","game_id","game_code","game_name","game_type","channel","customer_id","stake_amount","win_amount","currency_code","time","progressive_winnings")  FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|';

2 rows imported in 0.192 seconds.
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data;

 customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------

(0 rows)

回答1:

Ok, I had to change a couple of things about your data file to make this work:

SomeName|673|SomeName|SomeName|TYPE|M|123123|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0
SomeName|673|SomeName|SomeName|TYPE|M|456456|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0

Removed the trailing pipe.
Truncated the time down to seconds.
Removed all single quotes.

Once I did that, then I executed:

aploetz@cqlsh:stackoverflow> COPY row_historical_game_outcome_data 
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount,
 win_amount,currency_code , time , progressive_winnings) 
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|';

Improper COPY command.

This one was a little tricky. I finally figured out that COPY did not like the column name time. I adjusted the table to use the name game_time instead, and re-ran the COPY:

aploetz@cqlsh:stackoverflow> DROP TABLE row_historical_game_outcome_data ;
aploetz@cqlsh:stackoverflow> CREATE TABLE row_historical_game_outcome_data (
             ...     customer_id int,
             ...     game_id int,
             ...     game_time timestamp,
             ...     channel text,
             ...     currency_code text,
             ...     game_code text,
             ...     game_name text,
             ...     game_type text,
             ...     game_vendor text,
             ...     progressive_winnings double,
             ...     stake_amount double,
             ...     win_amount double,
             ...     PRIMARY KEY ((customer_id, game_id, game_time))
             ... );

aploetz@cqlsh:stackoverflow> COPY row_historical_game_outcome_data
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount,
 win_amount,currency_code , game_time , progressive_winnings)
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|';

3 rows imported in 0.738 seconds.
aploetz@cqlsh:stackoverflow> SELECT * FROM row_historical_game_outcome_data ;

 customer_id | game_id | game_time                | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+--------------------------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------
      123123 |     673 | 2015-07-01 00:01:42-0500 |       M |           GBP |  SomeName |  SomeName |      TYPE |    SomeName |                    0 |          0.2 |          0
      456456 |     673 | 2015-07-01 00:01:42-0500 |       M |           GBP |  SomeName |  SomeName |      TYPE |    SomeName |                    0 |          0.2 |          0

(2 rows)

I'm not sure why it says "3 rows imported," so my guess is that it is counting the header row.
Your keys are all partition keys. Not sure if you really understood that or not. I only point it out, because I can't think of a reason to specify multiple partition keys without also specifying a clustering key(s).
I cannot find anything in the DataStax docs indicating that "time" is a reserved word. It's probably a bug in cqlsh. But seriously, you should probably specify your time-based column names as something other than "time" anyway.

回答2:

One other comment. COPY in CQL has an addition of WITH HEADER = TRUE that will cause the header row (first row) of the CSV file to be ignored. (http://docs.datastax.com/en/cql/3.3/cql/cql_reference/copy_r.html)

"time" is not a reserved word in CQL (believe me, because I just updated the CQL reserved words in the DataStax docs myself). However, you do show spaces between the column names in the COPY command around the column name "time", and I think that's the problem. No spaces, just commas; do the same in the CSV file for all rows. (http://docs.datastax.com/en/cql/3.3/cql/cql_reference/keywords_r.html)