I have log files stored as text in HDFS. When I load the log files into a Hive table, all the files are copied.
Can I avoid having all my text data stored twice?
I can say, instead of copying data by your java application directly to HDFS, have those file in local file system, and import them into HDFS via hive using following command.
LOAD DATA LOCAL INPATH '/your/local/filesystem/file.csv' INTO TABLE `sandbox.test` PARTITION (day='20130221')
Notice the LOCAL