We want to import 100 thousand rows from a .csv file into a Cassandra table.
There is no unique value for each row, for this reason we want to add UUID to each impor
There's no way to do that directly from CQL's COPY command, but instead you could process the CSV file outside of Cassandra first.
For example, here's a Python script that will read in from file in.csv, append a UUID column to each row, and write out to out.csv:
#!/usr/bin/python
# read in.csv adding one column for UUID
import csv
import uuid
fin = open('in.csv', 'rb')
fout = open('out.csv', 'w')
reader = csv.reader(fin, delimiter=',', quotechar='"')
writer = csv.writer(fout, delimiter=',', quotechar='"')
firstrow = True
for row in reader:
if firstrow:
row.append('UUID')
firstrow = False
else:
row.append(uuid.uuid4())
writer.writerow(row)
The resulting file could be imported using CQL COPY (after you've created your schema accordingly). If you use this example, make sure to read up on Python's uuid functions to choose the one you need (probably uuid1
or uuid4
).