pipeline

getting transformer results from sklearn.pipeline.Pipeline

◇◆丶佛笑我妖孽 提交于 2019-12-10 03:54:28
问题 I am using a sklearn.pipeline.Pipeline object for my clustering. pipe = sklearn.pipeline.Pipeline([('transformer1': transformer1), ('transformer2': transformer2), ('clusterer': clusterer)]) Then I am evaluating the result by using the silhouette score. sil = preprocessing.silhouette_score(X, y) I'm wondering how I can get the X or the transformed data from the pipeline as it only returns the clusterer.fit_predict(X) . I understand that I can do this by just splitting the pipeline as pipe =

how to compare two fields in a document in pipeline aggregation (mongoDB)

戏子无情 提交于 2019-12-10 03:00:58
问题 I have a document like below : { "user_id": NumberLong(1), "updated_at": ISODate("2016-11-17T09:35:56.200Z"), "created_at": ISODate("2016-11-17T09:35:07.981Z"), "banners": { "normal_x970h90": "/images/banners/4/582d79cb3aef567d64621be9/photo-1440700265116-fe3f91810d72.jpg", "normal_x468h60": "/images/banners/4/582d79cb3aef567d64621be9/photo-1433354359170-23a4ae7338c6.jpg", "normal_x120h600": "/images/banners/4/582d79cb3aef567d64621be9/photo-1452570053594-1b985d6ea890.jpg" }, "name":

bash script: how to save return value of first command in a pipeline?

送分小仙女□ 提交于 2019-12-10 02:41:51
问题 Bash: I want to run a command and pipe the results through some filter, but if the command fails, I want to return the command's error value, not the boring return value of the filter: E.g.: if !(cool_command | output_filter); then handle_the_error; fi Or: set -e cool_command | output_filter In either case it's the return value of cool_command that I care about -- for the 'if' condition in the first case, or to exit the script in the second case. Is there some clean idiom for doing this? 回答1:

Output binary data on PowerShell pipeline

和自甴很熟 提交于 2019-12-09 10:26:56
问题 I need to pipe some data to a program's stdin. First 4 bytes are a 32 bit unsigned integer representing the length of the data. These 4 bytes are exactly the same as C would store an unsigned int in memory. I refer to this as binary data. Remaining bytes are the data. In c, this is trivial: WriteFile(h, &cb, 4); // cb is a 4 byte integer WriteFile(h, pData, cb); or fwrite(&cb, sizeof(cb), 1, pFile); fwrite(pData, cb, 1, pFile); or c# you would use a BinaryWriter (I think this code is right, i

What's the difference between -> and |> in reasonml?

蓝咒 提交于 2019-12-09 08:40:35
问题 A period of intense googling provided me with some examples where people use both types of operators in one code, but generally they look just like two ways of doing one thing, they even have the same name 回答1: tl;dr: The defining difference is that -> pipes to the first argument while |> pipes to the last. That is: x -> f(y, z) <=> f(x, y, z) x |> f(y, z) <=> f(y, z, x) Unfortunately there are some subtleties and implications that makes this a bit more complicated and confusing in practice.

R: combine several gsub() function in a pipe

放肆的年华 提交于 2019-12-09 05:44:04
问题 To clean some messy data I would like to start using pipes %>% , but I fail to get the R code working if gsub() is not at the beginning of the pipe, should occur late (Note: this question is not concerned with proper import, but with data cleaning). Simple example: df <- cbind.data.frame(A= c("2.187,78 ", "5.491,28 ", "7.000,32 "), B = c("A","B","C")) Column A contains characters (in this case numbers, but this also could be string) and need to be cleaned. The steps are df$D <- gsub("\\.",""

how to use xargs with sed in search pattern

大憨熊 提交于 2019-12-09 04:39:05
问题 I need to use the output of a command as a search pattern in sed. I will make an example using echo, but assume that can be a more complicated command: echo "some pattern" | xargs sed -i 's/{}/replacement/g' file.txt That command doesn't work because "some pattern" has a whitespace, but I think that clearly illustrate my problem. How can I make that command work? Thanks in advance, 回答1: Use command substitution instead, so your example would look like: sed -i "s/$(echo "some pattern")

is an “optionalized” pipe operator idiomatic F#

不打扰是莪最后的温柔 提交于 2019-12-08 21:08:36
问题 I like to use the pipe operator '|>' a lot. However, when mixing functions that return 'simple' values with functions that return 'Option-Typed-values', things become a bit messy e.g.: // foo: int -> int*int // bar: int*int -> bool let f (x: string) = x |> int |> foo |> bar works, but it might throw a 'System.FormatException:...' Now assume I want to fix that by making the function 'int' give an optional result: let intOption x = match System.Int32.TryParse x with | (true, x) -> Some x |

setnames in pipeline R code

社会主义新天地 提交于 2019-12-08 20:01:01
问题 I was wondering if it was possible to set the names of elements of a list at the end of a pipeline code. data <- input_vars %>% purrr::map(get_data) names(data) <- input_vars Currently I pass a string of variables into a function which retrieves a list of dataframes. Unfortunately this list does not automatically have named elements, but I add them "manually" afterwards. In order to improve readability I would like to have something as follows data <- input_vars%>% purrr::map(get_comm_data) %

AttributeError: lower not found; using a Pipeline with a CountVectorizer in scikit-learn

核能气质少年 提交于 2019-12-08 17:26:38
问题 I have a corpus as such: X_train = [ ['this is an dummy example'] ['in reality this line is very long'] ... ['here is a last text in the training set'] ] and some labels: y_train = [1, 5, ... , 3] I would like to use Pipeline and GridSearch as follows: pipeline = Pipeline([ ('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('reg', SGDRegressor()) ]) parameters = { 'vect__max_df': (0.5, 0.75, 1.0), 'tfidf__use_idf': (True, False), 'reg__alpha': (0.00001, 0.000001), } grid_search =