snappy

kafka消息压缩算法

岁酱吖の 提交于 2019-12-06 16:24:49
kafka是如何压缩消息的? 要弄清楚这个问题,就要从kafka的消息格式说起。kafka的消息层次分为两层:消息集合(message set)以及消息(message)。一个消息集合包含若干条日志项(record item),而日志项才是真正封装消息的地方。kafka底层的消息日志由一系列消息集合日志项组成。kafka通常不会直接操作具体的一条条消息,它总是在消息集合这个层面上进行写入操作。 在kafka中,压缩可能会发生在两个地方:生产者端和broker端。 生产者程序中配置compression.type参数即表示启动指定类型的压缩算法。 public class KafkaProduce { public void kafkaProducer() throws Exception { Properties pro = new Properties(); ...//其他配置参数 pro.put("partitioner.class", "kafka.KafkaPartitioner"); // 启用压缩 pro.put("compression.type", "gzip"); KafkaProducer config = new KafkaProducer(pro); } 它表明producer的压缩算法使用的是gzip

Kafka源码解读——动态加载class类,减少项目依赖包

£可爱£侵袭症+ 提交于 2019-12-06 02:30:55
Java动态加载class类 最常见的一种场景,某些功能服务项目里根本没有使用到,但是因为项目里引用了该依赖包的class类,所以你不得不在即使没有使用到该服务的情况下,仍让要添加依赖到项目中。 但是通过动态加载class类,可以让你的项目大大减少第三方包的依赖。 核心思路就是通过反射和基本的判断语句去控制对象的实例化。 以Kafka源码为例: 以下是kafka中的压缩模块代码 public Compressor(ByteBuffer buffer, CompressionType type) { this.type = type; this.initPos = buffer.position(); ...... appendStream = wrapForOutput(bufferStream, type, COMPRESSION_DEFAULT_BUFFER_SIZE); // kafka数据压缩函数 } // the following two functions also need to be public since they are used in MemoryRecords.iteration static public DataOutputStream wrapForOutput(ByteBufferOutputStream buffer,

How to install snappy C libraries on Windows 10 for use with python-snappy in Anaconda?

耗尽温柔 提交于 2019-12-05 22:59:05
问题 I want to install parquet for python using pip within an Anaconda 2 installation on Windows 10. While installing I ran into the error that is described here, the installer can't find snappy-c.h . There is no mention on how to install this on Windows in the answers. I downloaded the Snappy library from http://google.github.io/snappy/ and now I'm stuck. From my error message I would have assumed that the header files need to be in C:\Users\...\AppData\Local\Continuum\Anaconda2\include , but in

How to decompress the hadoop reduce output file end with snappy?

纵然是瞬间 提交于 2019-12-05 13:17:31
问题 Our hadoop cluster using snappy as default codec. Hadoop job reduce output file name is like part-r-00000.snappy . JSnappy fails to decompress the file bcz JSnappy requires the file start with SNZ. The reduce output file start with some bytes 0 somehow. How could I decompress the file? 回答1: Use "Hadoop fs -text" to read this file and pipe it to txt file. ex: hadoop fs -text part-r-00001.snappy > /tmp/mydatafile.txt 来源: https://stackoverflow.com/questions/19805149/how-to-decompress-the-hadoop

Why is parquet slower for me against text file format in hive?

99封情书 提交于 2019-12-05 11:15:47
OK! So I decided to use Parquet as storage format for hive tables and before I actually implement it in my cluster, I decided to run some tests. Surprisingly, Parquet was slower in my tests as against the general notion that it is faster then plain text files. Please be noted that I am using Hive-0.13 on MapR Follows the flow of my operations Table A Format - Text Format Table size - 2.5 Gb Table B Format - Parquet Table size - 1.9 Gb [Create table B stored as parquet as select * from A] Table C Format - Parquet with snappy compression Table size - 1.9 Gb [Create table C stored as parquet

Comparison between lz4 vs lz4_hc vs blosc vs snappy vs fastlz

半腔热情 提交于 2019-12-04 10:15:03
问题 I have a large file of size 500 mb to compress in a minute with the best possible compression ratio. I have found out these algorithms to be suitable for my use. lz4 lz4_hc snappy quicklz blosc Can someone give a comparison of speed and compression ratios between these algorithms? 回答1: Yann Collet's lz4, hands down. 回答2: This migth help you: (lz4 vs snappy) http://java-performance.info/performance-general-compression/ (benchmarks for lz4, snappy, lz4hc, blosc) https://web.archive.org/web

cassandra 1.2 fails to init snappy in freebsd

和自甴很熟 提交于 2019-12-04 06:16:23
ERROR [WRITE-/10.10.35.30] 2013-06-19 23:15:56,907 CassandraDaemon.java (line 175) Exception in thread Thread[WRITE-/10.10.35.30,5,main] java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79) at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:66) at org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:341) at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:143) When going through know issues i found this The

on namespace ceilometer.$cmd failed: Authentication failed. 问题处理方案

假装没事ソ 提交于 2019-12-03 12:17:44
on namespace ceilometer.$cmd failed: Authentication failed. UserNotFound: Could not find user ceilometer@ceilometer 背景介绍 1、 Ceilometer 项目是 OpenStack 中用来做计量计费功能的一个组件,后来又逐步发展增加了部分监控采集、告警的功能。 2、 MongoDB 是一个基于分布式文件存储的数据库。由 C++ 语言编写。旨在为 WEB 应用提供可扩展的高性能数据存储解决方案。 3、前几年的一个项目就使用到了 Ceilometer 和 MongoDB ( 3.2.9 版本) 结合,用于存储性能和告警数据。 问题说明 最近,在某个现场环境上,MongoDB 挂载的存储设备出现了故障。但是,存储设备故障恢复后,MongoDB服务无法正常启动。 启动日志报错如下: 2019-10-31T16:33:27.651+0800 I CONTROL [main] ***** SERVER RESTARTED ***** 2019-10-31T16:33:27.658+0800 I CONTROL [initandlisten] MongoDB starting : pid=5097 port=27017 dbpath=/var/lib/mongodb 64-bit

Hadoop集群搭建-03编译安装hadoop

痞子三分冷 提交于 2019-12-03 09:42:04
Hadoop集群搭建-05安装配置YARN Hadoop集群搭建-04安装配置HDFS Hadoop集群搭建-03编译安装hadoop Hadoop集群搭建-02安装配置Zookeeper Hadoop集群搭建-01前期准备 hadoop的编译和安装是直接在一台机器上搞得,姑且nn1机器。 全程切换到root用户下操作 1.hadoop的一些资源在这里: https://www.lanzous.com/b849710/ 密码:9vui [hadoop@nn1 zk_op]$ su - root [root@nn1 ~]# mkdir /tmp/hadoop_c [root@nn1 ~]# cd /tmp/hadoop_c/ 用xshell的rz命令上传源码包到上面的目录。 [root@nn1 hadoop_c]# tar -xzf /tmp/hadoop_c/hadoop-2.7.3-src.tar.gz -C /usr/local/ yum安装一下乱七八糟要用到的软件和插件 yum -y install svn ncurses-devel gcc* lzo-devel zlib-devel autoconf automake libtool cmake openssl-devel bzip2 2.编译安装protobuf,谷歌的通信和存储协议,必须要用 [root@nn1 ~]#

Unable to run Snappy player on Beaglebone Black using Yocto Project

匿名 (未验证) 提交于 2019-12-03 09:10:12
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: My main objective is to run snappy player ( https://wiki.gnome.org/Snappy ) on target machine ( BeagleBone Black ) so, I wrote a recipe for Snappy player( snappy_1.0.bb ) as below LICENSE = "GPLv2" LIC_FILES_CHKSUM = "file://COPYING;md5=686e6cb566fd6382c9fcc7a557bf4544" SRCREV = "e73fabce4c397b40d490c74f6a6a0de000804f42" SRC_URI = "git://git.gnome.org/snappy" S = "${WORKDIR}/git" RDEPENDS_${PN} = "gtk+3 gstreamer1.0 glib-2.0 clutter-1.0 gstreamer1.0-plugins-base libxtst clutter-gst-3.0 clutter-gtk-1.0 libx11 cairo gdk-pixbuf" # inherit line