问题
How do you provide a custom configuration to a storm topology? For example, if I have a topology that I built that connects to a MySQL cluster and I want to be able to change which servers I need to to connect to without recompiling, how would I do that? My preference would be to use a config file, but my concern is that the file itself is not deployed to the cluster, therefore it won't be run (unless my understanding of how a cluster works is flawed). The only way I've seen so far to pass configuration options into a storm topology at runtime is via a command-line parameter, but that is messy when you get a good number of parameters.
One thought did have is to leverage a shell script to read the file into a variable and pass the contents of that variable in as a string to the topology, but I'd like something a little cleaner if possible.
Has anyone else encountered this? If so, how did you solve it?
EDIT:
It appears to need to provide more clarification. My scenario is that I have a topology that I want to be able to deploy in different environments without having to recompile it. Normally, I'd create a config file that contains things like database connection parameters and have that passed in. I'd like to know how to do something like that in Storm.
回答1:
You can specify a configuration (via a yaml file typically) which you submit with your topology. How we manage this ourselves in our own project is we have separate config files for development and one for production, and inside it we store our server, redis and db IPs and Ports etc. Then when we run our command to build the jar and submit the topology to storm it includes the correct config file depending on your deployment environment. The bolts and spouts simply read the configuration they require from the stormConf map which is passed to them in your bolt's prepare() method.
From http://storm.apache.org/documentation/Configuration.html :
Every configuration has a default value defined in defaults.yaml in the Storm codebase. You can override these configurations by defining a storm.yaml in the classpath of Nimbus and the supervisors. Finally, you can define a topology-specific configuration that you submit along with your topology when using StormSubmitter. However, the topology-specific configuration can only override configs prefixed with "TOPOLOGY".
Storm 0.7.0 and onwards lets you override configuration on a per-bolt/per-spout basis.
You'll also see on http://nathanmarz.github.io/storm/doc/backtype/storm/StormSubmitter.html that submitJar and submitTopology is passed a map called conf.
Hope this gets you started.
回答2:
I solved this problem by just providing the config in code:
config.put(Config.TOPOLOGY_WORKER_CHILDOPTS, SOME_OPTS);
I tried to provide a topology-specific storm.yaml
but it doesn't work. Correct me if you make it work to use a storm.yaml.
Update:
For anyone who wants to know what SOME_OPTS is, this is from Satish Duggana on the Storm mailing list:
Config.TOPOLOGY_WORKER_CHILDOPTS: Options which can override WORKER_CHILDOPTS for a topology. You can configure any java options like memory, gc etc
In your case it can be
config.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx1g");
回答3:
What might actually serve you best is to store the configuration in a mutable key value store (s3, redis, etc.) and then pull that in to configure a database connection that you then use (I assume here you are already planning to limit how often you talk to the database so that the overhead of getting this config is not a big deal). This design allows you to change the database connection on-the-fly, with no need to even redeploy the topology.
回答4:
The idea is that when you build your topology, you create instances of your spouts and bolts (among other things) and these instances are serialized and distributed to the right places in the cluster. If you want to configure the behavior of a spout or bolt, you do so when creating the topology before submitting it and you do so by setting instance variables on the bolt or spout that, in turn, drive the configurable behavior you want.
回答5:
i also faced the same issue.I solved it by configuring NFS in my cluster and i put that configuration file in shared location so that it would be available for all cluster machines.Its very easy to configure NFS in linux system link.
回答6:
I face the same problem as u did, and here is my tricky solution:
Use a simple java file as the configure file, say topo_config.java
, it looks like:
package com.xxx
public class topo_config {
public static String zk_host = "192.168.10.60:2181";
public static String kafka_topic = "my_log_topic";
public static int worker_num = 2;
public static int log_spout_num = 4;
// ...
}
This file is put in my configure folder, and then write a script, say compile.sh
which will copy it to the right package and do the compilation stuff, looks like:
cp config/topo_config.java src/main/java/com/xxx/
mvn package
The configuration is achieved directly:
Config conf = new Config();
conf.setNumWorkers(topo_config.worker_num);
回答7:
we have seen the same issue and solved it by adding the below per topology
config.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx4096m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:NewSize=128m -XX:CMSInitiatingOccupancyFraction=70 -XX:-CMSConcurrentMTEnabled -Djava.net.preferIPv4Stack=true");
Also verified using Nimbus UI it show like below.
topology.worker.childopts -Xmx4096m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:NewSize=128m -XX:CMSInitiatingOccupancyFraction=70 -XX:-CMSConcurrentMTEnabled -Djava.net.preferIPv4Stack=true
来源:https://stackoverflow.com/questions/18061332/storm-topology-configuration