I am having a 2 GB
data in my HDFS
.
Is it possible to get that data randomly. Like we do in the Unix command line
cat iris2.cs
My suggestion would be to load that data into Hive table, then you can do something like this:
SELECT column1, column2 FROM (
SELECT iris2.column1, iris2.column2, rand() AS r
FROM iris2
ORDER BY r
) t
LIMIT 50;
EDIT: This is simpler version of that query:
SELECT iris2.column1, iris2.column2
FROM iris2
ORDER BY rand()
LIMIT 50;