dataflow

Using Variable as expression in Derived column transformation SSIS

那年仲夏 提交于 2019-12-11 14:02:50
问题 Essentially I a SSIS pkg with an Execute SQL statement that dynamically writes a REPLACE function based on some table values. (ie REPLACE(REPLACE(Col1," * ","",),"@@@","") ExecuteSQL result is put to variable @Cleanse In my Derived Column conversion Im trying to call @User::Cleanse as an expression in to replace the Value of the Col1 from the DataFlow. The result appears to be pulling the result of @Cleanse and using it as a string value rather than applying it as the REPLACE function. When

Error debugging SSIS (excel source, data conversion, OLE DB destination)

北城以北 提交于 2019-12-11 08:48:42
问题 I am having issues creating a good data flow from Excel Source to SQL DB in BIDS 2010. I'm using 32-bit runtime, i have windows authentication on the SQL Server. I'm trying to send the data to a table that has no relationships or constraints at all. My excel file is .xls and I've tried doing this to SS2012 and SS2008R2 databases, getting the same errors back. Here's my Package Validation Error: Error at Data Flow Component [SSIS.Pipeline]: "component "Source for Excel Connection Manager" (1)"

How to get the real execution time of a Pipeline and the duration time of start up of the VMs of a Dataflow Job

此生再无相见时 提交于 2019-12-11 05:59:03
问题 I want to get both duration times: the exact time of start up of the Virtual Machines deployed in Compute Engine, and the real execution time of the Pipeline, when a Dataflow Job ends (which is much less than the elapse time showed by the Job in the Dataflow website) I need to get these duration times from Java, and maybe if I can get these values directly from Google Cloud website will be fine also. 来源: https://stackoverflow.com/questions/45122068/how-to-get-the-real-execution-time-of-a

Programmatically terminating PubSubIO.readMessages from Subscription after configured time?

China☆狼群 提交于 2019-12-11 05:13:54
问题 I am looking to schedule the Dataflow which has PubSubIO.readString from a PubSub topic's subscripton. How can i have the job to be terminating after a configured interval? My usecase is not to keep the job running through the entire day, so looking to schedule to start, and then stop after a configured interval from within the job. Pipeline .apply(PubsubIO.readMessages().fromSubscription("some-subscription")) 回答1: From docs: If you need to stop a running Cloud Dataflow job, you can do so by

The concept groupwin is like the unaligned windows?

守給你的承諾、 提交于 2019-12-11 04:36:30
问题 groupwin I use the meaning in esper: This view groups events into sub-views by the value returned by the specified expression or the combination of values returned by a list of expressions. I think it is that you have the ability to operate by group,not stream(the group by is used to control how aggregations are grouped.) unaligned window In google's dataflow ,unaligned windows means: By unaligned windows, we mean windows which do not span the entirety of a data source, but instead only a

How to use ExecuteScript (with python as a script engine) for an exercise to add numbers? [Novice user trying to learn NiFi]

筅森魡賤 提交于 2019-12-11 02:26:16
问题 I am relatively new to NiFi and am not sure how to do the following correctly. I would like to use ExecuteScript processor (script engine: python) to do the following (only in python please): 1) There is a CSV file containing the following information (the first row is the header): first,second,third 1,4,9 7,5,2 3,8,7 2) I would like to find the sum of individual rows and generate a final file with a modified header. The final file should look like this: first,second,third,total 1,4,9,14 7,5

How would you display/layout Data-Flow between Enterprise Applications?

喜欢而已 提交于 2019-12-10 23:56:31
问题 My employer is a large Swiss Telco. We have many Systems used to transfer data for different tasks, e.g. Performance Management, Fault Management, Configuration Management etc. In order explain to "Management" (pointy haired, and other) how these systems interact, I collected information about data flow/formats/protocols into a "database" ( of the comma delimited persuason) and then generated code for Graphviz (http://www.graphviz.org/) and Yed (http://www.yworks.com/en/products_yed_about

In data flow coverage, does returning a variable use it?

喜欢而已 提交于 2019-12-10 23:15:57
问题 I have a small question in my mind. I researched it on the Internet but no-one is providing the exact answer. My question is: In data flow coverage criteria, say there is a method which finally returns variable x . When drawing the graph for that method, is that return statement considered to be a use of x ? 回答1: Yes, a return statement uses the value that it returns. I couldn't find an authoritative reference that says so in plain English either, but here are two arguments: A return

Beam / Dataflow Custom Python job - Cloud Storage to PubSub

走远了吗. 提交于 2019-12-10 10:25:43
问题 I need to perform a very simple transformation on some data (extract a string from JSON), then write it to PubSub - I'm attempting to use a custom python Dataflow job to do so. I've written a job which successfully writes back to Cloud Storage, but my attempts at even the simplest possible write to PubSub (no transformation) result in an error: JOB_MESSAGE_ERROR: Workflow failed. Causes: Expected custom source to have non-zero number of splits. Has anyone successfully written to PubSub from

Design Pattern for multithreaded observers

落爺英雄遲暮 提交于 2019-12-09 23:13:09
问题 In a digital signal acquisition system, often data is pushed into an observer in the system by one thread. example from Wikipedia/Observer_pattern: foreach (IObserver observer in observers) observer.Update(message); When e.g. a user action from e.g. a GUI-thread requires the data to stop flowing, you want to break the subject-observer connection, and even dispose of the observer alltogether. One may argue: you should just stop the data source, and wait for a sentinel value to dispose of the