问题
I am trying to import a CSV file to a Cassandra table however I am facing a problem. When inserted successfully, at least this is what Cassandra tells, I still can't see any record. Here is a little more details :
qlsh:recommendation_engine> COPY row_historical_game_outcome_data FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|';
2 rows imported in 0.216 seconds.
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data;
customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------
(0 rows)
cqlsh:recommendation_engine>
And this is how my data looks like
'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|123123|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0|
'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|456456|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0|
This is cassandra version apache-cassandra-2.2.0
EDITED:
CREATE TABLE row_historical_game_outcome_data (
customer_id int,
game_id int,
time timestamp,
channel text,
currency_code text,
game_code text,
game_name text,
game_type text,
game_vendor text,
progressive_winnings double,
stake_amount double,
win_amount double,
PRIMARY KEY ((customer_id, game_id, time))
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
I've also tried the following as suggested by uri2x
But still nothing :
select * from row_historical_game_outcome_data;
customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------
(0 rows)
cqlsh:recommendation_engine> COPY row_historical_game_outcome_data ("game_vendor","game_id","game_code","game_name","game_type","channel","customer_id","stake_amount","win_amount","currency_code","time","progressive_winnings") FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|';
2 rows imported in 0.192 seconds.
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data;
customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------
(0 rows)
回答1:
Ok, I had to change a couple of things about your data file to make this work:
SomeName|673|SomeName|SomeName|TYPE|M|123123|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0
SomeName|673|SomeName|SomeName|TYPE|M|456456|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0
- Removed the trailing pipe.
- Truncated the time down to seconds.
- Removed all single quotes.
Once I did that, then I executed:
aploetz@cqlsh:stackoverflow> COPY row_historical_game_outcome_data
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount,
win_amount,currency_code , time , progressive_winnings)
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|';
Improper COPY command.
This one was a little tricky. I finally figured out that COPY
did not like the column name time
. I adjusted the table to use the name game_time
instead, and re-ran the COPY
:
aploetz@cqlsh:stackoverflow> DROP TABLE row_historical_game_outcome_data ;
aploetz@cqlsh:stackoverflow> CREATE TABLE row_historical_game_outcome_data (
... customer_id int,
... game_id int,
... game_time timestamp,
... channel text,
... currency_code text,
... game_code text,
... game_name text,
... game_type text,
... game_vendor text,
... progressive_winnings double,
... stake_amount double,
... win_amount double,
... PRIMARY KEY ((customer_id, game_id, game_time))
... );
aploetz@cqlsh:stackoverflow> COPY row_historical_game_outcome_data
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount,
win_amount,currency_code , game_time , progressive_winnings)
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|';
3 rows imported in 0.738 seconds.
aploetz@cqlsh:stackoverflow> SELECT * FROM row_historical_game_outcome_data ;
customer_id | game_id | game_time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+--------------------------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------
123123 | 673 | 2015-07-01 00:01:42-0500 | M | GBP | SomeName | SomeName | TYPE | SomeName | 0 | 0.2 | 0
456456 | 673 | 2015-07-01 00:01:42-0500 | M | GBP | SomeName | SomeName | TYPE | SomeName | 0 | 0.2 | 0
(2 rows)
- I'm not sure why it says "3 rows imported," so my guess is that it is counting the header row.
- Your keys are all partition keys. Not sure if you really understood that or not. I only point it out, because I can't think of a reason to specify multiple partition keys without also specifying a clustering key(s).
- I cannot find anything in the DataStax docs indicating that "time" is a reserved word. It's probably a bug in cqlsh. But seriously, you should probably specify your time-based column names as something other than "time" anyway.
回答2:
One other comment. COPY in CQL has an addition of WITH HEADER = TRUE that will cause the header row (first row) of the CSV file to be ignored. (http://docs.datastax.com/en/cql/3.3/cql/cql_reference/copy_r.html)
"time" is not a reserved word in CQL (believe me, because I just updated the CQL reserved words in the DataStax docs myself). However, you do show spaces between the column names in the COPY command around the column name "time", and I think that's the problem. No spaces, just commas; do the same in the CSV file for all rows. (http://docs.datastax.com/en/cql/3.3/cql/cql_reference/keywords_r.html)
回答3:
There are two things that bother cqlsh in your CSV file:
- Remove the trailing | at the end of each CSV line
- Remove the microseconds from your time values (the precision should be milliseconds at most).
来源:https://stackoverflow.com/questions/32269232/no-rows-inserted-in-table-when-import-from-csv-in-cassandra