pipeline | 易学教程

getting transformer results from sklearn.pipeline.Pipeline

阅读更多关于 getting transformer results from sklearn.pipeline.Pipeline

问题 I am using a sklearn.pipeline.Pipeline object for my clustering. pipe = sklearn.pipeline.Pipeline([('transformer1': transformer1), ('transformer2': transformer2), ('clusterer': clusterer)]) Then I am evaluating the result by using the silhouette score. sil = preprocessing.silhouette_score(X, y) I'm wondering how I can get the X or the transformed data from the pipeline as it only returns the clusterer.fit_predict(X) . I understand that I can do this by just splitting the pipeline as pipe =

how to compare two fields in a document in pipeline aggregation (mongoDB)

阅读更多关于 how to compare two fields in a document in pipeline aggregation (mongoDB)

问题 I have a document like below : { "user_id": NumberLong(1), "updated_at": ISODate("2016-11-17T09:35:56.200Z"), "created_at": ISODate("2016-11-17T09:35:07.981Z"), "banners": { "normal_x970h90": "/images/banners/4/582d79cb3aef567d64621be9/photo-1440700265116-fe3f91810d72.jpg", "normal_x468h60": "/images/banners/4/582d79cb3aef567d64621be9/photo-1433354359170-23a4ae7338c6.jpg", "normal_x120h600": "/images/banners/4/582d79cb3aef567d64621be9/photo-1452570053594-1b985d6ea890.jpg" }, "name":

bash script: how to save return value of first command in a pipeline?

阅读更多关于 bash script: how to save return value of first command in a pipeline?

问题 Bash: I want to run a command and pipe the results through some filter, but if the command fails, I want to return the command's error value, not the boring return value of the filter: E.g.: if !(cool_command | output_filter); then handle_the_error; fi Or: set -e cool_command | output_filter In either case it's the return value of cool_command that I care about -- for the 'if' condition in the first case, or to exit the script in the second case. Is there some clean idiom for doing this? 回答1:

Output binary data on PowerShell pipeline

阅读更多关于 Output binary data on PowerShell pipeline

问题 I need to pipe some data to a program's stdin. First 4 bytes are a 32 bit unsigned integer representing the length of the data. These 4 bytes are exactly the same as C would store an unsigned int in memory. I refer to this as binary data. Remaining bytes are the data. In c, this is trivial: WriteFile(h, &cb, 4); // cb is a 4 byte integer WriteFile(h, pData, cb); or fwrite(&cb, sizeof(cb), 1, pFile); fwrite(pData, cb, 1, pFile); or c# you would use a BinaryWriter (I think this code is right, i

What's the difference between -> and |> in reasonml?

阅读更多关于 What's the difference between -> and |> in reasonml?

问题 A period of intense googling provided me with some examples where people use both types of operators in one code, but generally they look just like two ways of doing one thing, they even have the same name 回答1: tl;dr: The defining difference is that -> pipes to the first argument while |> pipes to the last. That is: x -> f(y, z) <=> f(x, y, z) x |> f(y, z) <=> f(y, z, x) Unfortunately there are some subtleties and implications that makes this a bit more complicated and confusing in practice.

R: combine several gsub() function in a pipe

阅读更多关于 R: combine several gsub() function in a pipe

问题 To clean some messy data I would like to start using pipes %>% , but I fail to get the R code working if gsub() is not at the beginning of the pipe, should occur late (Note: this question is not concerned with proper import, but with data cleaning). Simple example: df <- cbind.data.frame(A= c("2.187,78 ", "5.491,28 ", "7.000,32 "), B = c("A","B","C")) Column A contains characters (in this case numbers, but this also could be string) and need to be cleaned. The steps are df$D <- gsub("\\.",""

how to use xargs with sed in search pattern

阅读更多关于 how to use xargs with sed in search pattern

问题 I need to use the output of a command as a search pattern in sed. I will make an example using echo, but assume that can be a more complicated command: echo "some pattern" | xargs sed -i 's/{}/replacement/g' file.txt That command doesn't work because "some pattern" has a whitespace, but I think that clearly illustrate my problem. How can I make that command work? Thanks in advance, 回答1: Use command substitution instead, so your example would look like: sed -i "s/$(echo "some pattern")

is an “optionalized” pipe operator idiomatic F#

阅读更多关于 is an “optionalized” pipe operator idiomatic F#

问题 I like to use the pipe operator '|>' a lot. However, when mixing functions that return 'simple' values with functions that return 'Option-Typed-values', things become a bit messy e.g.: // foo: int -> int*int // bar: int*int -> bool let f (x: string) = x |> int |> foo |> bar works, but it might throw a 'System.FormatException:...' Now assume I want to fix that by making the function 'int' give an optional result: let intOption x = match System.Int32.TryParse x with | (true, x) -> Some x |

setnames in pipeline R code

阅读更多关于 setnames in pipeline R code

问题 I was wondering if it was possible to set the names of elements of a list at the end of a pipeline code. data <- input_vars %>% purrr::map(get_data) names(data) <- input_vars Currently I pass a string of variables into a function which retrieves a list of dataframes. Unfortunately this list does not automatically have named elements, but I add them "manually" afterwards. In order to improve readability I would like to have something as follows data <- input_vars%>% purrr::map(get_comm_data) %

AttributeError: lower not found; using a Pipeline with a CountVectorizer in scikit-learn

阅读更多关于 AttributeError: lower not found; using a Pipeline with a CountVectorizer in scikit-learn

问题 I have a corpus as such: X_train = [ ['this is an dummy example'] ['in reality this line is very long'] ... ['here is a last text in the training set'] ] and some labels: y_train = [1, 5, ... , 3] I would like to use Pipeline and GridSearch as follows: pipeline = Pipeline([ ('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('reg', SGDRegressor()) ]) parameters = { 'vect__max_df': (0.5, 0.75, 1.0), 'tfidf__use_idf': (True, False), 'reg__alpha': (0.00001, 0.000001), } grid_search =