dataflow

TPL Dataflow: design for parallelism while keeping order

徘徊边缘 提交于 2019-12-06 05:57:45
问题 I have never worked with TPL before so I was wondering whether this can be done with it: My application creates a gif image animation file from a lot of frames. I start with a list of Bitmap which represents the frames of the gif file and need to do the following for each frame: paint a number of text/bitmaps onto the frame crop the frame resize the frame reduce the image to 256 colors Obviously this process can be done in parallel for all the frames in the list but for each frame the order

Dataflow between Android BroadcastReceiver, ContentProvider, and Activity?

被刻印的时光 ゝ 提交于 2019-12-06 02:52:19
问题 I've developed an application that receives a Broadcast and then launches an Activity , where that Activity queries a ContentProvider which pulls information out of the DNS in real-time. I'd like to be able to shuffle this so that instead of going: BroadcastReceiver.onReceive() { Intent intent = new Intent(...); intent.setData(...); // set a single String data context.startActivity(intent); } Activity.onCreate() { String value = intent.getData(); // get the String data Cursor =

Beam / Dataflow Custom Python job - Cloud Storage to PubSub

大憨熊 提交于 2019-12-05 19:23:15
I need to perform a very simple transformation on some data (extract a string from JSON), then write it to PubSub - I'm attempting to use a custom python Dataflow job to do so. I've written a job which successfully writes back to Cloud Storage, but my attempts at even the simplest possible write to PubSub (no transformation) result in an error: JOB_MESSAGE_ERROR: Workflow failed. Causes: Expected custom source to have non-zero number of splits. Has anyone successfully written to PubSub from GCS via Dataflow? Can anyone shed some light on what is going wrong here? def run(argv=None): parser =

Throttling a step in beam application

隐身守侯 提交于 2019-12-05 16:57:58
I'm using python beam on google dataflow, my pipeline looks like this: Read image urls from file >> Download images >> Process images The problem is that I can't let Download images step scale as much as it needs because my application can get blocked from the image server. Is it a way that I can throttle the step ? Either on input or output per minute. Thank you. One possibility, maybe naïve, is to introduce a sleep in the step. For this you need to know the maximum number of instances of the ParDo that can be running at the same time. If autoscalingAlgorithm is set to NONE you can obtain

Apache Beam MinimalWordcount example with Dataflow Runner on eclipse

試著忘記壹切 提交于 2019-12-05 12:25:53
I am trying to run the MinimalWordCount example using the DataFlowRunner from Eclipse on windows using MinimalWordCount -->Run As Java Application from with in eclipse , its the same stock code from the example using my gcs bucket , however I consistently get the following exception , can some one let me know whats the issue here? I have verified that the bucket name is correct. I already ran the gcloud init on my Windows machine. Exception in thread "main" java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk

Dataflow setting Controller Service Account

梦想的初衷 提交于 2019-12-05 06:32:35
I try to set up controller service account for Dataflow. In my dataflow options I have: options.setGcpCredential(GoogleCredentials.fromStream(new FileInputStream("key.json")).createScoped(someArrays)); options.setServiceAccount("xxx@yyy.iam.gserviceaccount.com"); But I'm getting: WARNING: Request failed with code 403, performed 0 retries due to IOExceptions, performed 0 retries due to unsuccessful status codes, HTTP framework says request can be retried, (caller responsible for retrying): https://dataflow.googleapis.com/v1b3/projects/MYPROJECT/locations/MYLOCATION/jobs Exception in thread

How to insert a cloumn with more than 255 characters to excel in SSIS

余生颓废 提交于 2019-12-04 22:33:27
I have a table column with length more than 255 characters and I need to insert this value into excel. When ever the column value exceeds 255 characters, I am getting " Excel Destination [15]: Truncation may occur due to inserting data from data flow column "Copy of Column 0" with a length of 500 to database column "column1" with a length of 255. " error at excel destination component. Please help me to find a way to insert a column with more than 255 characters into an excel. I'd imagine you can overwrite the setting through the Advanced Editor in SSIS. Right Click the Excel Destination >

How to create groups of N elements from a PCollection Apache Beam Python

时间秒杀一切 提交于 2019-12-04 14:29:02
问题 I am trying to accomplish something like this: Batch PCollection in Beam/Dataflow The answer in the above link is in Java, whereas the language I'm working with is Python. Thus, I require some help getting a similar construction. Specifically I have this: p = beam.Pipeline (options = pipeline_options) lines = p | 'File reading' >> ReadFromText (known_args.input) After this, I need to create another PCollection but with a List of N rows of "lines" since my use case requires a group of rows. I

How to solve Duplicate values exception when I create PCollectionView<Map<String,String>>

十年热恋 提交于 2019-12-04 14:06:52
I'm setting up a slow-changing lookup Map in my Apache-Beam pipeline. It continuously updates the lookup map. For each key in lookup map, I retrieve the latest value in the global window with accumulating mode. But it always meets Exception : org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.IllegalArgumentException: Duplicate values for mykey Is anything wrong with this snippet code? If I use .discardingFiredPanes() instead, I will lose information in the last emit. pipeline .apply(GenerateSequence.from(0).withRate(1, Duration.standardMinutes(1L))) .apply( Window.<Long>into

Dataflow Programming API for Java? [closed]

丶灬走出姿态 提交于 2019-12-04 10:52:30
问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 7 years ago . I am looking for a Dataflow / Concurrent Programming API for Java. I know there's DataRush, but it's not free. What I'm interested in