I want to select specific fields of a table in cassandra and insert them into another table. I do this in sql server like this:
INSERT INTO Users(name,family)
COPY keyspace.columnfamily1 (column1, column2,...) TO 'temp.csv';
COPY keyspace.columnfamily2 (column1, column2,...) FROM 'temp.csv';
here give your keyspace(schema-name) and instead of columnfamilyname1 use the table to which you want to copy and in columnfamily2 give the tablename in which you want to copy..
And yes this is solution for CQL,however I have never tried in with CLI.
For very large tables, CQLSH will have a hard time handling the COPY TO/FROM. Here is how to sweep the table by local token ranges and copy desired columns from one table to another. Because it uses the local copy, this is to be executed on every node of the Datacenter:
#!/bin/bash
#
# Script to COPY from a SOURCE table's select columns over a TARGET table
# Assumes the following:
#
# * The SOURCE table is very large - otherwise just try:
# cqlsh -e "COPY keyspace.src_table (col1, col2, ...,ColN ) TO STDOUT WITH HEADER=false" \
# |cqlsh -e "COPY keyspace.tgt_table (col1, col2, ...,ColN ) FROM STDIN"
# * SOURCE AND TARGET TABLES are in the SAME KEYSPACE
# * TARGET columns are named the SAME as SOURCE
#
# The script sweeps thru the local tokens to copy only the local data over to the new table
# Therefore, this script needs to run on every node on the datacenter
#
# Set these variables before executing
#
USR=cassandra
PWD=password
KSP=my_keyspace
SRC=src_table
COL="col1, col2, col3"
PKY="col1"
TGT=tgt_table
CQLSH="cqlsh -u ${USR} -p ${PWD} -k ${KSP}"
function getTokens(){
for i in $($CQLSH -e "select tokens from system.local;" | awk -F, '/{/{print $0}' | tr -d '{' | tr -d '}' | tr -d ','); do
echo ${i//\'/}
done | sort -n
}
function getDataByTokenRange(){
i=0
tokens=($(getTokens))
while [ ${i} -lt ${#tokens[@]} ]; do
[ ${i} -eq 0 ] && echo "SELECT ${COL} FROM ${SRC} WHERE token(${PKY}) <= ${tokens[i]};"
[ "${tokens[i+1]}" ] && echo "SELECT ${COL} FROM ${SRC} WHERE token(${PKY}) > ${tokens[i]} AND token(${PKY}) <= ${tokens[i+1]};"
[ ! "${tokens[i+1]}" ] && echo "SELECT ${COL} FROM ${SRC} WHERE token(${PKY}) > ${tokens[i]};"
((i++))
done
}
function cqlExec(){
while IFS='' read -r cql || [[ -n "$line" ]]; do
$CQLSH -e "CONSISTENCY LOCAL_ONE; $cql" \
|awk -F\| '( !/LOCAL_ONE/ && !/'"${COL/, /|}"'/ && !/^\-+/ && !/^\([0-9]+ rows)/ && !/^$/ ){print $0}' \
|sed -e 's/^[ ]*//g' -e 's/[ ]*|[ ]*/|/g' \
|$CQLSH -e "COPY ${TGT} (${COL}) FROM STDIN WITH DELIMITER = '|' and HEADER=false;"
[ "$?" -gt 0 ] && echo "ERROR: Failed to import data from command: ${command}"
done < "$1"
}
main(){
echo "Begin processing ..."
getDataByTokenRange > getDataByTokenRange.ddl
cqlExec getDataByTokenRange.ddl
echo "End procesing"
}
main
For tables that are not very large, save yourself the file and use an anonymous pipe:
cqlsh -e "COPY keyspace.src_table (col1, col2, ...,ColN ) TO STDOUT WITH HEADER=false" | cqlsh -e "COPY keyspace.target_table (col1, col2, ...,ColN ) FROM STDIN"
For very large datasets, this won't work. A strategy of per token range should be explored