问题
I'm loading some simple data from Cassandra into Pig using CqlStorage
. The CqlStorage
loader defines a schema based on the Cassandra schema, but it seems to be wrong.
If I do:
data = LOAD 'cql://bookdata/books' USING CqlStorage();
DESCRIBE data;
I get this:
data: {isbn: chararray,bookauthor: chararray,booktitle: chararray,publisher: chararray,yearofpublication: int}
However, if I DUMP
data
, I get results like these:
((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986))
Clearly the results from Cassandra are key/value pairs, as would be expected. I don't know why the schema generated by CqlStorage()
would be so different.
This is really causing me problems trying to access the column values. I tried a naive approach of FLATTEN
ing each tuple, then trying to access the values that way:
flattened = FOREACH data GENERATE
FLATTEN(isbn),
FLATTEN(booktitle),
...
values = FOREACH flattened GENERATE
$1 AS ISBN,
$3 AS BookTitle,
...
As soon as I try to access field $5
, Pig complains about the index being out of bounds. (Curiously, flattened
thinks it has the same schema as the original data
.)
Somehow, CqlStorage
seems to be generating the wrong schema, and that schema persists to projections of the original collection. Is there any way to work around this?
(I'm using Cassandra 1.2.8 and Pig 0.11.1)
回答1:
This was resolved for, CCE: BinSedesTuple cannot be cast to String, by Applying the fix in https://issues.apache.org/jira/browse/CASSANDRA-5867.
As Alex Lui, mentioned in my ticket:
git clone http://git-wip-us.apache.org/repos/asf/cassandra.git
cd cassandra
git checkout cassandra-1.2
patch -p1 < 5867-bug-fix-filter-push-down-1.2-branch.txt
ant
来源:https://stackoverflow.com/questions/18391552/cqlstorage-generates-wrong-pig-schema