百度网盘链接:hadoop2.6.0-cdh5.7.1 版本的snappy库
https://pan.baidu.com/s/1UNXWFq5_eNyqMAaZGO2VcA
提取码:52tw
1、下载好解压把文件存放到$HADOOP_HOME/lib/native下
hadoop checknative -a 检查是否安装成功
2、如果全部是false,在hadoop-env.sh中添加export HADOOP_ROOT_LOGGER=DEBUG,console
运行hadoop checknative -a来查看详细错误
解决方法:
在hadoop-env.sh添加:
export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"
3、安装好使用snappy压缩
3.1先用sqoop导数
./sqoop import --connect jdbc:mysql://node2:3306/sqoop --username root --password root --query 'select * from test_1 where $CONDITIONS' -m 1 --fields-terminated-by ',' --target-dir /sqoop/ --hive-import --hive-database tkdw --hive-table wang01
3.2在创建snappy表
create table wang_snappy( id int , name string , age string)
row format delimited fields terminated by ',' stored as orc tblproperties("orc.compress"="snappy")
3.3再插入数据
insert into table wang_snappy select * from wang01;
3.4查看,对比一下压缩大小
不启用压缩:
启用snappy压缩
使用orc默认zlib压缩
综合他们都推荐snappy,根据压缩解压速度,占用cpu资源等综合考虑
4、开启hive压缩
<!-- open intermediate compress -->
<property>
<name>hive.exec.compress.intermediate</name>
<value>true</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value> org.apache.hadoop.io.compress.SnappyCodec </value>
</property>
<!-- open compress -->
<property>
<name>hive.exec.compress.output</name>
<value>true</value>
</property>
<property>
<name>mapred.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
5、开启hadoop 压缩
需要再mapred-site.xml加入
<property>
<name>mapreduce.output.fileoutputformat.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
core-site.xml
<property>
<name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
com.hadoop.compression.lzo.LzoCodec,
org.apache.hadoop.io.compress.Lz4Codec,
org.apache.hadoop.io.compress.SnappyCodec
</value>
</property>
这里需要注意,put上去的不走mr,所以不会压缩,用sqoop导入到hdfs,走mr,会有压缩。
集群中有不健康的节点,需要看日志,看linux的磁盘使用情况,hadoop的配置文件(针对节点目录不一致的)
来源:CSDN
作者:尘缘未了-
链接:https://blog.csdn.net/qq_35315256/article/details/90479378