No rows inserted in table when import from CSV in Cassandra

老子叫甜甜 提交于 2019-12-11 13:47:41

问题


I am trying to import a CSV file to a Cassandra table however I am facing a problem. When inserted successfully, at least this is what Cassandra tells, I still can't see any record. Here is a little more details :

qlsh:recommendation_engine> COPY row_historical_game_outcome_data  FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|';

2 rows imported in 0.216 seconds.
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data;

 customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------

(0 rows)
cqlsh:recommendation_engine> 

And this is how my data looks like

'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|123123|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0|
'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|456456|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0| 

This is cassandra version apache-cassandra-2.2.0

EDITED:

CREATE TABLE row_historical_game_outcome_data (
    customer_id int,
    game_id int,
    time timestamp,
    channel text,
    currency_code text,
    game_code text,
    game_name text,
    game_type text,
    game_vendor text,
    progressive_winnings double,
    stake_amount double,
    win_amount double,
    PRIMARY KEY ((customer_id, game_id, time))
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

I've also tried the following as suggested by uri2x

But still nothing :

select * from row_historical_game_outcome_data;

 customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------

(0 rows)
cqlsh:recommendation_engine> COPY row_historical_game_outcome_data ("game_vendor","game_id","game_code","game_name","game_type","channel","customer_id","stake_amount","win_amount","currency_code","time","progressive_winnings")  FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|';

2 rows imported in 0.192 seconds.
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data;

 customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------

(0 rows)

回答1:


Ok, I had to change a couple of things about your data file to make this work:

SomeName|673|SomeName|SomeName|TYPE|M|123123|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0
SomeName|673|SomeName|SomeName|TYPE|M|456456|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0
  • Removed the trailing pipe.
  • Truncated the time down to seconds.
  • Removed all single quotes.

Once I did that, then I executed:

aploetz@cqlsh:stackoverflow> COPY row_historical_game_outcome_data 
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount,
 win_amount,currency_code , time , progressive_winnings) 
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|';

Improper COPY command.

This one was a little tricky. I finally figured out that COPY did not like the column name time. I adjusted the table to use the name game_time instead, and re-ran the COPY:

aploetz@cqlsh:stackoverflow> DROP TABLE row_historical_game_outcome_data ;
aploetz@cqlsh:stackoverflow> CREATE TABLE row_historical_game_outcome_data (
             ...     customer_id int,
             ...     game_id int,
             ...     game_time timestamp,
             ...     channel text,
             ...     currency_code text,
             ...     game_code text,
             ...     game_name text,
             ...     game_type text,
             ...     game_vendor text,
             ...     progressive_winnings double,
             ...     stake_amount double,
             ...     win_amount double,
             ...     PRIMARY KEY ((customer_id, game_id, game_time))
             ... );

aploetz@cqlsh:stackoverflow> COPY row_historical_game_outcome_data
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount,
 win_amount,currency_code , game_time , progressive_winnings)
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|';

3 rows imported in 0.738 seconds.
aploetz@cqlsh:stackoverflow> SELECT * FROM row_historical_game_outcome_data ;

 customer_id | game_id | game_time                | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount
-------------+---------+--------------------------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------
      123123 |     673 | 2015-07-01 00:01:42-0500 |       M |           GBP |  SomeName |  SomeName |      TYPE |    SomeName |                    0 |          0.2 |          0
      456456 |     673 | 2015-07-01 00:01:42-0500 |       M |           GBP |  SomeName |  SomeName |      TYPE |    SomeName |                    0 |          0.2 |          0

(2 rows)
  • I'm not sure why it says "3 rows imported," so my guess is that it is counting the header row.
  • Your keys are all partition keys. Not sure if you really understood that or not. I only point it out, because I can't think of a reason to specify multiple partition keys without also specifying a clustering key(s).
  • I cannot find anything in the DataStax docs indicating that "time" is a reserved word. It's probably a bug in cqlsh. But seriously, you should probably specify your time-based column names as something other than "time" anyway.



回答2:


One other comment. COPY in CQL has an addition of WITH HEADER = TRUE that will cause the header row (first row) of the CSV file to be ignored. (http://docs.datastax.com/en/cql/3.3/cql/cql_reference/copy_r.html)

"time" is not a reserved word in CQL (believe me, because I just updated the CQL reserved words in the DataStax docs myself). However, you do show spaces between the column names in the COPY command around the column name "time", and I think that's the problem. No spaces, just commas; do the same in the CSV file for all rows. (http://docs.datastax.com/en/cql/3.3/cql/cql_reference/keywords_r.html)




回答3:


There are two things that bother cqlsh in your CSV file:

  1. Remove the trailing | at the end of each CSV line
  2. Remove the microseconds from your time values (the precision should be milliseconds at most).


来源:https://stackoverflow.com/questions/32269232/no-rows-inserted-in-table-when-import-from-csv-in-cassandra

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!