pipeline

Luigi - Overriding Task requires/input

二次信任 提交于 2019-12-23 11:00:39
问题 I am using luigi to execute a chain of tasks, like so: class Task1(luigi.Task): stuff = luigi.Parameter() def output(self): return luigi.LocalTarget('test.json') def run(self): with self.output().open('w') as f: f.write(stuff) class Task2(luigi.Task): stuff = luigi.Parameter() def requires(self): return Task1(stuff=self.stuff) def output(self): return luigi.LocalTarget('something-else.json') def run(self): with self.output().open('w') as f: f.write(stuff) This works exactly as desired when I

issuing multiple requests before getting response

泄露秘密 提交于 2019-12-23 07:03:19
问题 I'm having trouble understanding how HTTP works when multiple requests are send parallely (before getting a response). There are two cases: 1) With Connection: Keep-Alive . According to HTTP spec: A client that supports persistent connections MAY "pipeline" its requests (i.e., send multiple requests without waiting for each response). A server MUST send its responses to those requests in the same order that the requests were received. That way seems to be quite difficult to implement and

What is the preferred way to setup a continuous integration build chain for a big project with TeamCity?

為{幸葍}努か 提交于 2019-12-22 10:15:53
问题 for some time my company is now using Maven and TeamCity to build Java stuff. Currently we are investing quite heavily in continuous integration and ultimately continuous delivery. Among many smaller applications (apps) we are operating one big monolith app with approx. 1 million LOC. This app on quite a big build agent takes 5 minutes to compile (incl. 2 minutes svn up). Its 12k unit tests are running for another 5 minutes. Deploying the build results to Nexus takes at least 10 minutes. To

Is it possible to force PowerShell script to throw if a required *pipeline* parameter is omitted?

℡╲_俬逩灬. 提交于 2019-12-22 09:46:11
问题 Interactive PowerShell sessions prompt the user when a required parameter is omitted. Shay Levy offers a workaround to this problem. The problem is that workaround does not work when you use the pipeline to bind parameters. Consider this example: function f { [CmdletBinding()] param ( [Parameter(ValueFromPipeLineByPropertyName=$true)] [ValidateNotNullOrEmpty()] [string]$a=$(throw "a is mandatory, please provide a value.") ) process{} } $o = New-Object psobject -Property @{a=1} $o | f This

How do I define functions within a CmdletBinding() script?

本小妞迷上赌 提交于 2019-12-22 09:25:29
问题 I'm writing a script that I'd like to use PowerShell's CmdletBinding() with. Is there a way to define functions in the script? When I try, PowerShell complains about "Unexpected toke 'function' in expression or statement" Here's a simplified example of what I'm trying to do. [CmdletBinding()] param( [String] $Value ) BEGIN { f("Begin") } PROCESS { f("Process:" + $Value) } END { f("End") } Function f() { param([String]$m) Write-Host $m } In my case, writing a module is wasted overhead. The

Multi-Threaded NLP with Spacy pipe

随声附和 提交于 2019-12-22 08:35:23
问题 I'm trying to apply Spacy NLP (Natural Language Processing) pipline to a big text file like Wikipedia Dump. Here is my code based on Spacy's documentation example: from spacy.en import English input = open("big_file.txt") big_text= input.read() input.close() nlp= English() out = nlp.pipe([unicode(big_text, errors='ignore')], n_threads=-1) doc = out.next() Spacy applies all nlp operations like POS tagging, Lemmatizing and etc all at once. It is like a pipeline for NLP that takes care of

Dealing with dynamic columns with VectorAssembler

你说的曾经没有我的故事 提交于 2019-12-22 00:33:40
问题 Using sparks vector assembler the columns to be assembled need to be defined up front. However, if using the vector-assembler in a pipeline where the previous steps will modify the columns of the data frame how can I specify the columns without hard coding all the value manually? As df.columns will not contain the right values when the constructor is called of vector-assembler currently I do not see another way to handle that or to split the pipeline - which is bad as well because

Should I parse git status or use gitsharp?

让人想犯罪 __ 提交于 2019-12-21 14:59:30
问题 I'd like to integrate git into production pipeline to stage 3dsmax files. While it is alright to work with git through TortoiseGit, I'd like to communicate with it from the Maxscript to add custom menu commands to 3dsmax. Should I parse git status output text to determine folder status or should I use some wrapping tool to correctly communicate with git? I was thinking about gitsharp since it is easy to call dotNet objects from Maxscript, but I didn't use external dotNet programs. 回答1: My own

Should I parse git status or use gitsharp?

巧了我就是萌 提交于 2019-12-21 14:59:25
问题 I'd like to integrate git into production pipeline to stage 3dsmax files. While it is alright to work with git through TortoiseGit, I'd like to communicate with it from the Maxscript to add custom menu commands to 3dsmax. Should I parse git status output text to determine folder status or should I use some wrapping tool to correctly communicate with git? I was thinking about gitsharp since it is easy to call dotNet objects from Maxscript, but I didn't use external dotNet programs. 回答1: My own

How can I parallelize a pipeline of generators/iterators in Python?

痞子三分冷 提交于 2019-12-21 12:06:18
问题 Suppose I have some Python code like the following: input = open("input.txt") x = (process_line(line) for line in input) y = (process_item(item) for item in x) z = (generate_output_line(item) + "\n" for item in y) output = open("output.txt", "w") output.writelines(z) This code reads each line from the input file, runs it through several functions, and writes the output to the output file. Now I know that the functions process_line , process_item , and generate_output_line will never interfere