I\'m setting up a demo landscape for Cassandra, Apache Spark and Flume on my Mac (Mac OS X Yosemite with Oracle jdk1.7.0_55). The landscape shall work as a proof of concept for
Loading csv file into cassandra table
step1)install cassandra loader using this url
sudo wget https://github.com/brianmhess/cassandra-loader/releases/download/v0.0.23/cassandra-loader
step2)sudo chmod +x cassandra-loader
a)csv file name is "pt_bms_tkt_success_record_details_new_2016_12_082017-01-0312-30-01.csv"
b)keyspace name is "bms_test"
c)Table name is "pt_bms_tkt_success_record_details_new"
d)columns are "trx_id......trx_day"
step3)csv file location and cassandra-loader is "cassandra3.7/bin/"
step$)[stp@ril-srv-sp3 bin]$ ./cassandra-loader -f pt_bms_tkt_success_record_details_new_2016_12_082017-01-0312-30-01.csv -host 192.168.1.29 -schema "bms_test.pt_bms_tkt_success_record_details_new(trx_id,max_seq,trx_type,trx_record_type,trx_date,trx_show_date,cinema_str_id,session_id,ttype_code,item_id,item_var_sequence,trx_booking_id,venue_name,screen_by_tnum,price_group_code,area_cat_str_code,area_by_tnum,venue_capacity,amount_currentprice,venue_class,trx_booking_status_committed,booking_status,amount_paymentstatus,event_application,venue_cinema_companyname,venue_cinema_name,venue_cinema_type,venue_cinema_application,region_str_code,venue_city_name,sub_region_str_code,sub_region_str_name,event_code,event_type,event_name,event_language,event_genre,event_censor_rating,event_release_date,event_producer_code,event_item_name,event_itemvariable_name,event_quantity,amount_amount,amount_bookingfee,amount_deliveryfee,amount_additionalcharges,amount_final,amount_tax,offer_isapplied,offer_type,offer_name,offer_amount,payment_lastmode,payment_lastamount,payment_reference1,payment_reference2,payment_bank,customer_loginid,customer_loginstring,offer_referral,customer_mailid,customer_mobile,trans_str_sales_status_at_venue,trans_mny_trans_value_at_venue,payment_ismypayment,click_recordsource,campaign,source,keyword,medium,venue_multiplex,venue_state,mobile_type,transaction_range,life_cyclestate_from,transactions_after_offer,is_premium_transaction,city_type,holiday_season,week_type,event_popularity,transactionrange_after_discount,showminusbooking,input_source_name,channel,time_stamp,life_cyclestate_to,record_status,week_name,number_of_active_customers,event_genre1,event_genre2,event_genre3,event_genre4,event_language1,event_language2,event_language3,event_language4,event_release_date_range,showminusbooking_range,reserve1,reserve2,reserve3,reserve4,reserve5,payment_mode,payment_type,date_of_first_transaction,transaction_time_in_hours,showtime_in_hours,trx_day)";
cqlsh's COPY
command can be touchy. However, in the COPY documentation is this line:
The number of columns in the CSV input is the same as the number of columns in the Cassandra table metadata.
Keeping that in-mind, I did manage to get your data to import with a COPY FROM
, by naming the empty fields (processstarttimeuuid
and processendtimeuuid
, respectively):
aploetz@cqlsh:stackoverflow> COPY process (processuuid, processid, processnumber,
processname, processstarttime, processstarttimeuuid, processendtime,
processendtimeuuid, processstatus, orderer, vorgangsnummer, vehicleid, fin, reference,
referencetype) FROM 'Process_BulkData.csv' WITH DELIMITER = ';' AND HEADER = TRUE;
1 rows imported in 0.018 seconds.
aploetz@cqlsh:stackoverflow> SELECT * FROM process ;
processuuid | fin | orderer | processendtime | processendtimeuuid | processid | processname | processnumber | processstarttime | processstarttimeuuid | processstatus | reference | referencetype | vehicleid | vorgangsnummer
--------------------------------------+-------------------+---------+---------------------------+--------------------+-------------------+--------------------+---------------+---------------------------+----------------------+---------------+------------+---------------+-----------+----------------
0f0d1498-d149-4fcc-87c9-f12783fdf769 | WAU2345CX67890876 | SIXT | 2011-02-16 22:05:00+-0600 | null | AbmeldungKl‰rfall | Abmeldung Kl‰rfall | 1 | 2011-02-02 22:05:00+-0600 | null | Finished | KLA-BR4278 | internal | A-XA 1 | 4278
(1 rows)