问题
Should the file name contain a number for the tetFileStream to pickup? my program is picking up new files only if the file name contains a number. Ignoring all other files even if they are new. Is there any setting I need to change for picking up all the files? Please help
回答1:
No. it scans the directory for new files which appear within the window. If you are writing to S3, do a direct write with your code, as the file doesn't appear until the final close() —no need to rename. In constrast, if you are working with file streaming sources against normal filesystems, you should create out of the scanned dir and rename in at the end —otherwise work-in-progress files may get read. And once read: never re-read.
回答2:
After spending hours on analyzing stack trace, I figured out that the problem is S3 address. I was providing "s3://mybucket", which was working for Spark 1.6 and Scala 2.10.5. On Spark 2.0 (and Scala 2.11), it must be provided as "s3://mybucket/". May be some Regex related stuff. Working fine now. Thanks for all the help.
来源:https://stackoverflow.com/questions/40506276/spark-textfilestream-on-s3