excluding nodes from qsub command under sge

后端 未结 3 2167
北恋
北恋 2021-02-13 05:14

I have more than 200 jobs I need to submit to and sge cluster. I\'ll be submitting them into two ques. One of the ques have a machine that I don\'t want to submit jobs to. How c

3条回答
  •  误落风尘
    2021-02-13 05:27

    The best way I've found for this is to set up a custom resource on the nodes that you want to allow the execution on, then require that resource when you submit the job.

    In qmon, go to the "complex" configuration and add a new attribute. Set the name to something like "my_allowed" and the shortcut to something like "m_a", the type to BOOL, the relation to ==, requestable to Yes, consumable to No, and "Add" it. Commit your changes to the complex configurations.

    The next step is probably easier to do from the command line, but you can do it in qmon, as well. You need to add your consumable to each host that you're going to allow your job to run on. In qmon, you can go to the host configuration, select execution host, and open each host in turn, clicking on the consumables/fixed attributes tab and adding the new complex that you just configured above with "True" as the value. From the command line, you can get a list of your execution hosts with "qconf -sel". This list is suitable for passing to a loop and grepping out the host(s) you don't want included. Do something like this:

    qconf -sel | grep -v host_to_exclude | while read host; do
        EDITOR="ed" qconf -me $h <

    This lets you programmatically edit the host (not normally allowed by qconf as it wants to start up your editor for you). It does this by setting the editor to "ed" (you'll have to make sure you have the ed editor installed... try running it by hand first... type "q" to get out). ed takes the list of editing commands on it's stdin, so we give it three commands. The first edits the line with the complex_values on it to include the my_test value. The second writes out the temporary file and the third quits ed.

    Once you've done this, submit your jobs with a limit option that requires your new complex:

    qsub -q whatever -l my_test=True my_prog.sh
    

    The -l option sets a limit and the my_test=True says the job can only run on hosts that have the complex my_test with a value of True. Since the complex isn't consumable, it can still run as many jobs on each host as it wants to (up to the slot limit for the hosts), but it will avoid any hosts that don't have the my_test complex set to True.

提交回复
热议问题