pipeline

Dealing with dynamic columns with VectorAssembler

不问归期 提交于 2019-12-04 21:05:25
Using sparks vector assembler the columns to be assembled need to be defined up front. However, if using the vector-assembler in a pipeline where the previous steps will modify the columns of the data frame how can I specify the columns without hard coding all the value manually? As df.columns will not contain the right values when the constructor is called of vector-assembler currently I do not see another way to handle that or to split the pipeline - which is bad as well because CrossValidator will no longer properly work. val vectorAssembler = new VectorAssembler() .setInputCols(df.columns

How do I store a command in a variable and use it in a pipeline? [duplicate]

故事扮演 提交于 2019-12-04 19:21:23
This question already has answers here : Why does shell ignore quotes in arguments passed to it through variables? [duplicate] (3 answers) Closed 3 years ago . If i use this command in pipeline, it's working very well; pipeline ... | grep -P '^[^\s]*\s3\s' But if I want to set grep into variable like: var="grep -P '^[^\s]*\s3\s'" And if I put variable in pipeline; pipeline ... | $var nothing happens, like there isn't any matches. Any help what am I doing wrong? The robust way to store a simple command in a variable in Bash is to use an array : # Store the command names and arguments

Asset Pipeline/Framework for PHP

允我心安 提交于 2019-12-04 17:54:48
问题 Background I am working on "modernizing" a pre-existing PHP-driven website. This website started out as a static website with a few php methods. It now has a mobile web app, multiple models, and a lot of dynamic content. However, overtime the structure of the app itself hasn't changed much from when it was a largely static site, so now there are include files all over the place, no separation of application/presentation logic, etc etc. It is a mess to work on. So I am reorganizing everything

return coefficients from Pipeline object in sklearn

左心房为你撑大大i 提交于 2019-12-04 17:32:34
问题 I've fit a Pipeline object with RandomizedSearchCV pipe_sgd = Pipeline([('scl', StandardScaler()), ('clf', SGDClassifier(n_jobs=-1))]) param_dist_sgd = {'clf__loss': ['log'], 'clf__penalty': [None, 'l1', 'l2', 'elasticnet'], 'clf__alpha': np.linspace(0.15, 0.35), 'clf__n_iter': [3, 5, 7]} sgd_randomized_pipe = RandomizedSearchCV(estimator = pipe_sgd, param_distributions=param_dist_sgd, cv=3, n_iter=30, n_jobs=-1) sgd_randomized_pipe.fit(X_train, y_train) I want to access the coef_ attribute

std::cin really slow

旧巷老猫 提交于 2019-12-04 15:13:57
问题 So I was trying to write myself a command for a linux pipeline. Think of it as a replica of gnu 'cat' or 'sed', that takes input from stdin, does some processing and writes to stdout. I originally wrote an AWK script but wanted more performance so I used the following c++ code: std::string crtLine; crtLine.reserve(1000); while (true) { std::getline(std::cin, crtLine); if (!std::cin) // failbit (EOF immediately found) or badbit (I/O error) break; std::cout << crtLine << "\n"; } This is exactly

Create releases from within a GitLab runner/pipeline

笑着哭i 提交于 2019-12-04 14:41:46
With the release of Gitlab 11.7 in January 2019, we get the new key feature Publish releases for your projects . I want precisely what the screenshot on that page shows and I want to be able to download compiled binaries using the releases API . I can do it manually. Of course, instructions for the manual approach can be found here on stack overflow . The problem I need help with is doing it as part of a CI/CD pipeline, which is not covered by the answers one can find easily. The release notes contain a link to the documentation , which states: we recommend doing this as one of the last steps

Assembly PC Relative Addressing Mode

拈花ヽ惹草 提交于 2019-12-04 13:50:08
I am working on datapaths and have been trying to understand branch instructions. So this is what I understand. In MIPS, every instruction is 32 bits. This is 4 bytes. So the next instruction would be four bytes away. In terms of example, I say PC address is 128. My first issue is understanding what this 128 means. My current belief is that it is an index in the memory, so 128 refers to 128 bytes across in the memory. Therefore, in the datapath it always says to add 4 to the PC. Add 4 bits to the 128 bits makes 132, but this is actually 132 bytes across now (next instruction). This is the way

How to use RandomForest in Spark Pipeline

对着背影说爱祢 提交于 2019-12-04 13:16:26
问题 I want to tunning my model with grid search and cross validation with spark. In the spark, it must put the base model in a pipeline, the office demo of pipeline use the LogistictRegression as an base model, which can be new as an object. However, the RandomForest model cannot be new by client code, so it seems not be able to use RandomForest in the pipeline api. I don't want to recreate an wheel, so can anybody give some advice? Thanks 回答1: However, the RandomForest model cannot be new by

Create Jenkins Docker Image with pre configured jobs

不羁的心 提交于 2019-12-04 11:19:24
问题 I have created a bunch of Local deployment pipeline jobs, these jobs do things like remove an existing container, build a service locally, build a docker image, run the container - etc. These are not CI/CD jobs, just small pipelines for deploying locally during dev. What I want to do now is make this available to all our devs, so they can just simply spin up a local instance of jenkins that already contains the jobs. My docker file is reasonably straight forward... FROM jenkins:latest USER

Should I parse git status or use gitsharp?

烂漫一生 提交于 2019-12-04 09:15:01
I'd like to integrate git into production pipeline to stage 3dsmax files. While it is alright to work with git through TortoiseGit, I'd like to communicate with it from the Maxscript to add custom menu commands to 3dsmax. Should I parse git status output text to determine folder status or should I use some wrapping tool to correctly communicate with git? I was thinking about gitsharp since it is easy to call dotNet objects from Maxscript, but I didn't use external dotNet programs. My own attempt to solve this resulted in parsing git status. Seems cleaner and easier to implement. On the other