Bolt

【Storm】- Storm集成kafka

做~自己de王妃 提交于 2020-01-07 15:27:22
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> Storm 流式处理Kafka数据 tips 老版本: 官方文档 新版本: 官方文档 Storm可集成组件: 测试代码 需求:给kafka数据添加日期 实际用途:可根据业务续期自定义,例如解析Nginx日志ip限制访问等 pom <?xml version="1.0" encoding="UTF-8"?> <project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <parent> <groupId>com.zhiwei</groupId> <artifactId>data_process_experience</artifactId> <version>1.0-SNAPSHOT</version> </parent> <artifactId>storm_experience</artifactId

Jstorm是参考storm的实时流式计算框架

你。 提交于 2019-12-25 22:51:10
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> Jstorm是参考storm的实时流式计算框架,在网络IO、线程模型、资源调度、可用性及稳定性上做了持续改进,已被越来越多企业使用 作为commiter和user,我还是非常看好它的应用前景,下面是在团队内的分享介绍,更多请参考 https://github.com/alibaba/jstorm 一、jstorm是什么 jstorm可以看作是storm的java增强版本,除了内核用纯java实现外,还包括了thrift、python、facet ui。从架构上看,其本质是一个基于zk的分布式调度系统 Jstorm主要应用场景有: 1.信息流处理,如聚合、分析等 2.持续计算,如实时数据统计、监控 3.分布式rpc调用 Jstorm在内核上对storm的改进有: (1)模型简化 (2)多维度资源调度 (3)网络通信层改造 (4)采样重构 (5)worker/task内部异步化处理 (6)classload、HA 模型简化将storm的三层管理模型简化为两层 jstorm中task直接对应了线程概念,而在storm中是task只是线程executor的一个执行逻辑单元 多维度资源调度 分为cpu、memory、net、disk四个维度,默认情况下: cpu slots = 机器核数 * 2 -1 memory

聊聊storm的tickTuple

不羁岁月 提交于 2019-12-04 04:58:21
序 本文主要研究一下storm的tickTuple 实例 TickWordCountBolt public class TickWordCountBolt extends BaseBasicBolt { private static final Logger LOGGER = LoggerFactory.getLogger(TickWordCountBolt.class); Map<String, Integer> counts = new HashMap<String, Integer>(); @Override public Map<String, Object> getComponentConfiguration() { Config conf = new Config(); conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 10); return conf; } @Override public void execute(Tuple input, BasicOutputCollector collector) { if(TupleUtils.isTick(input)){ //execute tick logic LOGGER.info("execute tick tuple, emit and clear counts");

聊聊flink的BoltWrapper

自闭症网瘾萝莉.ら 提交于 2019-12-03 13:42:05
序 本文主要研究一下flink的BoltWrapper BoltWrapper flink-storm_2.11-1.6.2-sources.jar!/org/apache/flink/storm/wrappers/BoltWrapper.java /** * A {@link BoltWrapper} wraps an {@link IRichBolt} in order to execute the Storm bolt within a Flink Streaming program. * It takes the Flink input tuples of type {@code IN} and transforms them into {@link StormTuple}s that the bolt can * process. Furthermore, it takes the bolt's output tuples and transforms them into Flink tuples of type {@code OUT} * (see {@link AbstractStormCollector} for supported types).<br/> * <br/> * <strong>Works for single input streams only!

聊聊storm trident batch的分流与聚合

余生颓废 提交于 2019-12-03 13:41:21
序 本文主要研究一下storm trident batch的分流与聚合 实例 TridentTopology topology = new TridentTopology(); topology.newStream("spout1", spout) .partitionBy(new Fields("user")) .partitionAggregate(new Fields("user","score","batchId"),new OriginUserCountAggregator(),new Fields("result","aggBatchId")) .parallelismHint(3) .global() .aggregate(new Fields("result","aggBatchId"),new AggAgg(),new Fields("agg")) .each(new Fields("agg"),new PrintEachFunc(),new Fields()) ; 这里最后构造了3个bolt,分别为b-0、b-1、b-2 b-0主要是partitionAggregate,它的parallelismHint为3 b-1主要是处理CombinerAggregator的init,它的parallelismHint为1,由于它的上游bolt有3个task

聊聊storm的IWaitStrategy

↘锁芯ラ 提交于 2019-12-03 13:32:59
序 本文主要研究一下storm的IWaitStrategy IWaitStrategy storm-2.0.0/storm-client/src/jvm/org/apache/storm/policy/IWaitStrategy.java public interface IWaitStrategy { static IWaitStrategy createBackPressureWaitStrategy(Map<String, Object> topologyConf) { IWaitStrategy producerWaitStrategy = ReflectionUtils.newInstance((String) topologyConf.get(Config.TOPOLOGY_BACKPRESSURE_WAIT_STRATEGY)); producerWaitStrategy.prepare(topologyConf, WAIT_SITUATION.BACK_PRESSURE_WAIT); return producerWaitStrategy; } void prepare(Map<String, Object> conf, WAIT_SITUATION waitSituation); /** * Implementations of this method

Neo4j with a reverse proxy and NGINX

假装没事ソ 提交于 2019-12-02 11:54:57
问题 I'm having trouble addressing Neo4j via a reverse proxy with NGINX. The web client works without problems, but I have no idea about the Bolt protocol. Here's how the web client works: server { listen 80; server_name XXX; location / { proxy_pass http://YYY:7474/; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host; proxy_redirect off; proxy_buffering off; } } But how does the Bolt protocol over port 7687 work?

聊聊storm的JoinBolt

久未见 提交于 2019-12-02 06:22:50
序 本文主要研究一下storm的JoinBolt 实例 @Test public void testJoinBolt() throws InvalidTopologyException, AuthorizationException, AlreadyAliveException { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("uuid-spout", new RandomWordSpout(new String[]{"uuid", "timestamp"}), 1); builder.setSpout("word-spout", new RandomWordSpout(new String[]{"word", "timestamp"}), 1); JoinBolt joinBolt = new JoinBolt("uuid-spout", "timestamp") //from priorStream inner join newStream on newStream.field = priorStream.field1 .join("word-spout", "timestamp", "uuid-spout") .select("uuid,word,timestamp")

聊聊storm TridentBoltExecutor的finishBatch方法

冷暖自知 提交于 2019-12-02 06:20:51
序 本文主要研究一下storm TridentBoltExecutor的finishBatch方法 MasterBatchCoordinator.nextTuple storm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/MasterBatchCoordinator.java public void nextTuple() { sync(); } private void sync() { // note that sometimes the tuples active may be less than max_spout_pending, e.g. // max_spout_pending = 3 // tx 1, 2, 3 active, tx 2 is acked. there won't be a commit for tx 2 (because tx 1 isn't committed yet), // and there won't be a batch for tx 4 because there's max_spout_pending tx active TransactionStatus maybeCommit = _activeTx.get(_currTransaction); if

Storm目录树、任务提交、消息容错、通信机制

喜夏-厌秋 提交于 2019-11-30 20:41:17
Storm技术增强 注:学习本课程,请先学习Storm基础 课程目标: 通过本模块的学习,能够掌握Storm底层的通信机制、消息容错机制、storm目录树及任务提交流程。 课程大纲: 1、 Storm程序的并发机制 2、 Storm框架通信机制(worker内部通信与外部通信) 3、 Storm组件本地目录树 4、 Storm zookeeper目录树 5、 Storm 任务提交的过程 1、Storm程序的并发机制 1.1、概念  Workers (JVMs): 在一个物理节点上可以运行一个或多个独立的JVM 进程。一个Topology可以包含一个或多个worker(并行的跑在不同的物理机上), 所以worker process就是执行一个topology的子集, 并且worker只能对应于一个topology  Executors (threads): 在一个worker JVM进程中运行着多个Java线程。一个executor线程可以执行一个或多个tasks。但一般默认每个executor只执行一个task。一个worker可以包含一个或多个executor, 每个component (spout或bolt)至少对应于一个executor, 所以可以说executor执行一个compenent的子集, 同时一个executor只能对应于一个component。 