Bolt | 易学教程

【Storm】- Storm集成kafka

阅读更多关于【Storm】- Storm集成kafka

【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> Storm 流式处理Kafka数据 tips 老版本：官方文档新版本：官方文档 Storm可集成组件：测试代码需求：给kafka数据添加日期实际用途：可根据业务续期自定义，例如解析Nginx日志ip限制访问等 pom <?xml version="1.0" encoding="UTF-8"?> <project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <parent> <groupId>com.zhiwei</groupId> <artifactId>data_process_experience</artifactId> <version>1.0-SNAPSHOT</version> </parent> <artifactId>storm_experience</artifactId

Jstorm是参考storm的实时流式计算框架

阅读更多关于 Jstorm是参考storm的实时流式计算框架

【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> Jstorm是参考storm的实时流式计算框架，在网络IO、线程模型、资源调度、可用性及稳定性上做了持续改进，已被越来越多企业使用作为commiter和user，我还是非常看好它的应用前景，下面是在团队内的分享介绍，更多请参考 https://github.com/alibaba/jstorm 一、jstorm是什么 jstorm可以看作是storm的java增强版本，除了内核用纯java实现外，还包括了thrift、python、facet ui。从架构上看，其本质是一个基于zk的分布式调度系统 Jstorm主要应用场景有： 1.信息流处理，如聚合、分析等 2.持续计算，如实时数据统计、监控 3.分布式rpc调用 Jstorm在内核上对storm的改进有：（1）模型简化（2）多维度资源调度（3）网络通信层改造（4）采样重构（5）worker/task内部异步化处理（6）classload、HA 模型简化将storm的三层管理模型简化为两层 jstorm中task直接对应了线程概念，而在storm中是task只是线程executor的一个执行逻辑单元多维度资源调度分为cpu、memory、net、disk四个维度，默认情况下： cpu slots = 机器核数 * 2 -1 memory

聊聊storm的tickTuple

阅读更多关于聊聊storm的tickTuple

序本文主要研究一下storm的tickTuple 实例 TickWordCountBolt public class TickWordCountBolt extends BaseBasicBolt { private static final Logger LOGGER = LoggerFactory.getLogger(TickWordCountBolt.class); Map<String, Integer> counts = new HashMap<String, Integer>(); @Override public Map<String, Object> getComponentConfiguration() { Config conf = new Config(); conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 10); return conf; } @Override public void execute(Tuple input, BasicOutputCollector collector) { if(TupleUtils.isTick(input)){ //execute tick logic LOGGER.info("execute tick tuple, emit and clear counts");

聊聊flink的BoltWrapper

阅读更多关于聊聊flink的BoltWrapper

序本文主要研究一下flink的BoltWrapper BoltWrapper flink-storm_2.11-1.6.2-sources.jar!/org/apache/flink/storm/wrappers/BoltWrapper.java /** * A {@link BoltWrapper} wraps an {@link IRichBolt} in order to execute the Storm bolt within a Flink Streaming program. * It takes the Flink input tuples of type {@code IN} and transforms them into {@link StormTuple}s that the bolt can * process. Furthermore, it takes the bolt's output tuples and transforms them into Flink tuples of type {@code OUT} * (see {@link AbstractStormCollector} for supported types).<br/> * <br/> * <strong>Works for single input streams only!

聊聊storm trident batch的分流与聚合

阅读更多关于聊聊storm trident batch的分流与聚合

序本文主要研究一下storm trident batch的分流与聚合实例 TridentTopology topology = new TridentTopology(); topology.newStream("spout1", spout) .partitionBy(new Fields("user")) .partitionAggregate(new Fields("user","score","batchId"),new OriginUserCountAggregator(),new Fields("result","aggBatchId")) .parallelismHint(3) .global() .aggregate(new Fields("result","aggBatchId"),new AggAgg(),new Fields("agg")) .each(new Fields("agg"),new PrintEachFunc(),new Fields()) ; 这里最后构造了3个bolt，分别为b-0、b-1、b-2 b-0主要是partitionAggregate，它的parallelismHint为3 b-1主要是处理CombinerAggregator的init，它的parallelismHint为1，由于它的上游bolt有3个task

聊聊storm的IWaitStrategy

阅读更多关于聊聊storm的IWaitStrategy

序本文主要研究一下storm的IWaitStrategy IWaitStrategy storm-2.0.0/storm-client/src/jvm/org/apache/storm/policy/IWaitStrategy.java public interface IWaitStrategy { static IWaitStrategy createBackPressureWaitStrategy(Map<String, Object> topologyConf) { IWaitStrategy producerWaitStrategy = ReflectionUtils.newInstance((String) topologyConf.get(Config.TOPOLOGY_BACKPRESSURE_WAIT_STRATEGY)); producerWaitStrategy.prepare(topologyConf, WAIT_SITUATION.BACK_PRESSURE_WAIT); return producerWaitStrategy; } void prepare(Map<String, Object> conf, WAIT_SITUATION waitSituation); /** * Implementations of this method

Neo4j with a reverse proxy and NGINX

阅读更多关于 Neo4j with a reverse proxy and NGINX

问题 I'm having trouble addressing Neo4j via a reverse proxy with NGINX. The web client works without problems, but I have no idea about the Bolt protocol. Here's how the web client works: server { listen 80; server_name XXX; location / { proxy_pass http://YYY:7474/; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host; proxy_redirect off; proxy_buffering off; } } But how does the Bolt protocol over port 7687 work?

聊聊storm的JoinBolt

阅读更多关于聊聊storm的JoinBolt

序本文主要研究一下storm的JoinBolt 实例 @Test public void testJoinBolt() throws InvalidTopologyException, AuthorizationException, AlreadyAliveException { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("uuid-spout", new RandomWordSpout(new String[]{"uuid", "timestamp"}), 1); builder.setSpout("word-spout", new RandomWordSpout(new String[]{"word", "timestamp"}), 1); JoinBolt joinBolt = new JoinBolt("uuid-spout", "timestamp") //from priorStream inner join newStream on newStream.field = priorStream.field1 .join("word-spout", "timestamp", "uuid-spout") .select("uuid,word,timestamp")

聊聊storm TridentBoltExecutor的finishBatch方法

阅读更多关于聊聊storm TridentBoltExecutor的finishBatch方法

序本文主要研究一下storm TridentBoltExecutor的finishBatch方法 MasterBatchCoordinator.nextTuple storm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/MasterBatchCoordinator.java public void nextTuple() { sync(); } private void sync() { // note that sometimes the tuples active may be less than max_spout_pending, e.g. // max_spout_pending = 3 // tx 1, 2, 3 active, tx 2 is acked. there won't be a commit for tx 2 (because tx 1 isn't committed yet), // and there won't be a batch for tx 4 because there's max_spout_pending tx active TransactionStatus maybeCommit = _activeTx.get(_currTransaction); if

Storm目录树、任务提交、消息容错、通信机制

阅读更多关于 Storm目录树、任务提交、消息容错、通信机制

Storm技术增强注：学习本课程，请先学习Storm基础课程目标：通过本模块的学习，能够掌握Storm底层的通信机制、消息容错机制、storm目录树及任务提交流程。课程大纲： 1、 Storm程序的并发机制 2、 Storm框架通信机制（worker内部通信与外部通信） 3、 Storm组件本地目录树 4、 Storm zookeeper目录树 5、 Storm 任务提交的过程 1、Storm程序的并发机制 1.1、概念  Workers (JVMs): 在一个物理节点上可以运行一个或多个独立的JVM 进程。一个Topology可以包含一个或多个worker(并行的跑在不同的物理机上), 所以worker process就是执行一个topology的子集, 并且worker只能对应于一个topology  Executors (threads): 在一个worker JVM进程中运行着多个Java线程。一个executor线程可以执行一个或多个tasks。但一般默认每个executor只执行一个task。一个worker可以包含一个或多个executor, 每个component (spout或bolt)至少对应于一个executor, 所以可以说executor执行一个compenent的子集, 同时一个executor只能对应于一个component。 

订阅 Bolt