How do you provide a custom configuration to a storm topology? For example, if I have a topology that I built that connects to a MySQL cluster and I want to be able to change w
I solved this problem by just providing the config in code:
config.put(Config.TOPOLOGY_WORKER_CHILDOPTS, SOME_OPTS);
I tried to provide a topology-specific storm.yaml
but it doesn't work. Correct me if you make it work to use a storm.yaml.
Update:
For anyone who wants to know what SOME_OPTS is, this is from Satish Duggana on the Storm mailing list:
Config.TOPOLOGY_WORKER_CHILDOPTS: Options which can override WORKER_CHILDOPTS for a topology. You can configure any java options like memory, gc etc
In your case it can be
config.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx1g");
You can specify a configuration (via a yaml file typically) which you submit with your topology. How we manage this ourselves in our own project is we have separate config files for development and one for production, and inside it we store our server, redis and db IPs and Ports etc. Then when we run our command to build the jar and submit the topology to storm it includes the correct config file depending on your deployment environment. The bolts and spouts simply read the configuration they require from the stormConf map which is passed to them in your bolt's prepare() method.
From http://storm.apache.org/documentation/Configuration.html :
Every configuration has a default value defined in defaults.yaml in the Storm codebase. You can override these configurations by defining a storm.yaml in the classpath of Nimbus and the supervisors. Finally, you can define a topology-specific configuration that you submit along with your topology when using StormSubmitter. However, the topology-specific configuration can only override configs prefixed with "TOPOLOGY".
Storm 0.7.0 and onwards lets you override configuration on a per-bolt/per-spout basis.
You'll also see on http://nathanmarz.github.io/storm/doc/backtype/storm/StormSubmitter.html that submitJar and submitTopology is passed a map called conf.
Hope this gets you started.
we have seen the same issue and solved it by adding the below per topology
config.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx4096m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:NewSize=128m -XX:CMSInitiatingOccupancyFraction=70 -XX:-CMSConcurrentMTEnabled -Djava.net.preferIPv4Stack=true");
Also verified using Nimbus UI it show like below.
topology.worker.childopts -Xmx4096m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:NewSize=128m -XX:CMSInitiatingOccupancyFraction=70 -XX:-CMSConcurrentMTEnabled -Djava.net.preferIPv4Stack=true
What might actually serve you best is to store the configuration in a mutable key value store (s3, redis, etc.) and then pull that in to configure a database connection that you then use (I assume here you are already planning to limit how often you talk to the database so that the overhead of getting this config is not a big deal). This design allows you to change the database connection on-the-fly, with no need to even redeploy the topology.
I face the same problem as u did, and here is my tricky solution:
Use a simple java file as the configure file, say topo_config.java
, it looks like:
package com.xxx
public class topo_config {
public static String zk_host = "192.168.10.60:2181";
public static String kafka_topic = "my_log_topic";
public static int worker_num = 2;
public static int log_spout_num = 4;
// ...
}
This file is put in my configure folder, and then write a script, say compile.sh
which will copy it to the right package and do the compilation stuff, looks like:
cp config/topo_config.java src/main/java/com/xxx/
mvn package
The configuration is achieved directly:
Config conf = new Config();
conf.setNumWorkers(topo_config.worker_num);
i also faced the same issue.I solved it by configuring NFS in my cluster and i put that configuration file in shared location so that it would be available for all cluster machines.Its very easy to configure NFS in linux system link.