Kafka connector to hdfs: java.io.FileNotFoundException: File does not exist

爷,独闯天下 提交于 2020-07-28 04:57:45

问题


Everything was installed via ambari, HDP. I've ingested a sample file to kafka. The topic is testjson. Data ingested from csv file in filebeat. topics successfully ingested into kafka.

/bin/kafka-topics.sh --list --zookeeper localhost:2181

result:

test
test060920
test1
test12
testjson

From kafka i would like to ingest testjson to hdfs.

quickstart-hdfs.properties

name=hdfs-sink
connector.class=io.confluent.connect.hdfs3.Hdfs3SinkConnector
tasks.max=1
topics=testjson
hdfs.url=hdfs://x.x.x.x:8020
flush.size=3
confluent.license=
confluent.topic.bootstrap.servers=x.x.x.x:6667

connect-standalone.properties

bootstrap.servers=x.x.x.x:6667
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
plugin.path=/usr/share/java,/usr/share/confluent-hub-components

run it by

/usr/hdp/3.1.4.0-315/kafka/bin/connect-standalone.sh /etc/kafka/connect-standalone-json.properties /etc/kafka-connect-hdfs/quickstart-hdfs.properties

It returns errors as

Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/root/topics/+tmp/testjson/partition=0/26c82453-4980-40de-a9d4-276aa0f3899e_tmp.avro (inode 22757) [Lease.  Holder: DFSClient_NONMAPREDUCE_496631619_33, pending creates: 1]
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2815)
        at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:591)
        at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2694)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511)
        at org.apache.hadoop.ipc.Client.call(Client.java:1457)
        at org.apache.hadoop.ipc.Client.call(Client.java:1367)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy47.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:510)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy48.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1081)
        ... 3 more
[2020-06-23 10:35:30,240] ERROR WorkerSinkTask{id=hdfs-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:177)
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
        at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:586)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)
        at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
        at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.avro.AvroRuntimeException: already open
        at org.apache.avro.file.DataFileWriter.assertNotOpen(DataFileWriter.java:85)
        at org.apache.avro.file.DataFileWriter.setCodec(DataFileWriter.java:93)
        at io.confluent.connect.hdfs3.avro.AvroRecordWriterProvider$1.write(AvroRecordWriterProvider.java:59)
        at io.confluent.connect.hdfs3.TopicPartitionWriter.writeRecord(TopicPartitionWriter.java:675)
        at io.confluent.connect.hdfs3.TopicPartitionWriter.write(TopicPartitionWriter.java:374)
        at io.confluent.connect.hdfs3.DataWriter.write(DataWriter.java:359)
        at io.confluent.connect.hdfs3.Hdfs3SinkTask.put(Hdfs3SinkTask.java:108)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:564)
        ... 10 more
[2020-06-23 10:35:30,243] ERROR WorkerSinkTask{id=hdfs-sink-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:178)

questions:

  1. What happened? I noticed Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/root/topics/+tmp/testjson/partition=0/26c82453-4980-40de-a9d4-276aa0f3899e_tmp.avro (inode 22757) . But i dont understand what it got to do? I thought it was the destination in hdfs, I Have created root user. sudo -u hdfs hadoop fs -mkdir /user/root and add the permission for root sudo -u hdfs hadoop fs -chown root /user/root

Edit

I checked out /user/root/topics/+tmp/testjson/partition=0 and 26c82453-4980-40de-a9d4-276aa0f3899e_tmp.avrowas there. I dont know what should i do. Each time i rerun this, t always show missing different file, and after i stopped it then check it, The file is generated and it was there.

Edit 2

I checked out

root@ambari:/home/hadoop# hdfs dfs -ls /user/root/topics/+tmp/testjson
Found 1 items
drwxr-xr-x   - root hdfs          0 2020-06-24 06:18 /user/root/topics/+tmp/testjson/partition=0
root@ambari:/home/hadoop# hdfs dfs -ls /user/root/topics/+tmp/testjson/partition=0
Found 2 items
-rw-r--r--   3 root hdfs          0 2020-06-24 06:18 /user/root/topics/+tmp/testjson/partition=0/05cc9305-c370-44f8-8e9d-b311fb284e26_tmp.avro
-rw-r--r--   3 root hdfs          0 2020-06-24 03:26 /user/root/topics/+tmp/testjson/partition=0/26c82453-4980-40de-a9d4-276aa0f3899e_tmp.avro

_tmp.avro files were generated everytime i run. /usr/hdp/3.1.4.0-315/kafka/bin/connect-standalone.sh connect-standalone-json.properties quickstart-hdfs.properties. Incase of permission, i decided to change dfs.permissions.enabled into false (default is true) and rerun. Still no luck.

EDIT3

new error log

[2020-06-26 09:57:28,703] WARN The configuration 'config.storage.topic' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,704] WARN The configuration 'status.storage.topic' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,705] WARN The configuration 'config.storage.replication.factor' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,706] WARN The configuration 'offset.flush.interval.ms' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,706] WARN The configuration 'key.converter.schemas.enable' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,706] WARN The configuration 'offset.storage.file.filename' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,706] WARN The configuration 'status.storage.replication.factor' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,706] WARN The configuration 'value.converter.schemas.enable' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,706] WARN The configuration 'offset.storage.replication.factor' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,706] WARN The configuration 'offset.storage.topic' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,706] WARN The configuration 'value.converter' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,707] WARN The configuration 'key.converter' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,707] INFO Kafka version : 2.0.0.3.1.4.0-315 (org.apache.kafka.common.utils.AppInfoParser:109)
[2020-06-26 09:57:28,707] INFO Kafka commitId : 4243d589e2b33433 (org.apache.kafka.common.utils.AppInfoParser:110)
[2020-06-26 09:57:28,712] INFO Cluster ID: CQkRgktxRZmGv_C-Q87ViQ (org.apache.kafka.clients.Metadata:273)
[2020-06-26 09:57:28,733] INFO [Consumer clientId=consumer-3, groupId=connect-cluster] Discovered group coordinator 10.64.2.236:6667 (id: 2147482646 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:677)
[2020-06-26 09:57:28,742] INFO [Consumer clientId=consumer-3, groupId=connect-cluster] Resetting offset for partition connect-configs-0 to offset 0. (org.apache.kafka.clients.consumer.internals.Fetcher:583)
[2020-06-26 09:57:28,745] INFO Finished reading KafkaBasedLog for topic connect-configs (org.apache.kafka.connect.util.KafkaBasedLog:153)
[2020-06-26 09:57:28,745] INFO Started KafkaBasedLog for topic connect-configs (org.apache.kafka.connect.util.KafkaBasedLog:155)
[2020-06-26 09:57:28,745] INFO Started KafkaConfigBackingStore (org.apache.kafka.connect.storage.KafkaConfigBackingStore:253)
[2020-06-26 09:57:28,745] INFO Herder started (org.apache.kafka.connect.runtime.distributed.DistributedHerder:217)
[2020-06-26 09:57:28,753] INFO Cluster ID: CQkRgktxRZmGv_C-Q87ViQ (org.apache.kafka.clients.Metadata:273)
[2020-06-26 09:57:28,753] INFO [Worker clientId=connect-1, groupId=connect-cluster] Discovered group coordinator 10.64.2.236:6667 (id: 2147482646 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:677)
[2020-06-26 09:57:28,762] INFO [Worker clientId=connect-1, groupId=connect-cluster] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:509)
Jun 26, 2020 9:57:29 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.RootResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.RootResource will be ignored.
Jun 26, 2020 9:57:29 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource will be ignored.
Jun 26, 2020 9:57:29 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource will be ignored.
Jun 26, 2020 9:57:29 AM org.glassfish.jersey.internal.Errors logErrors
WARNING: The following warnings have been detected: WARNING: The (sub)resource method createConnector in org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains empty path annotation.
WARNING: The (sub)resource method listConnectors in org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains empty path annotation.
WARNING: The (sub)resource method listConnectorPlugins in org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource contains empty path annotation.
WARNING: The (sub)resource method serverInfo in org.apache.kafka.connect.runtime.rest.resources.RootResource contains empty path annotation.

[2020-06-26 09:57:29,315] INFO Started o.e.j.s.ServletContextHandler@3d6a6bee{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:855)
[2020-06-26 09:57:29,328] INFO Started http_8083@376a312c{HTTP/1.1,[http/1.1]}{0.0.0.0:8083} (org.eclipse.jetty.server.AbstractConnector:292)
[2020-06-26 09:57:29,329] INFO Started @5042ms (org.eclipse.jetty.server.Server:410)
[2020-06-26 09:57:29,329] INFO Advertised URI: http://10.64.2.236:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:267)
[2020-06-26 09:57:29,329] INFO REST server listening at http://10.64.2.236:8083/, advertising URL http://10.64.2.236:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:217)
[2020-06-26 09:57:29,329] INFO Kafka Connect started (org.apache.kafka.connect.runtime.Connect:55)
[2020-06-26 09:57:31,782] INFO [Worker clientId=connect-1, groupId=connect-cluster] Successfully joined group with generation 1 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:473)
[2020-06-26 09:57:31,783] INFO Joined group and got assignment: Assignment{error=0, leader='connect-1-c3e95584-0853-44d1-b3a9-4c97593f4f2d', leaderUrl='http://10.64.2.236:8083/', offset=-1, connectorIds=[], taskIds=[]} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1216)
[2020-06-26 09:57:31,784] INFO Starting connectors and tasks using config offset -1 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:858)
[2020-06-26 09:57:31,784] INFO Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:868)

as suggested i run it by /usr/hdp/3.1.4.0-315/kafka/bin/connect-distributed.sh connect-distributed.properties quickstart-hdfs.properties

cconnect-distributed.properties

bootstrap.servers=10.64.2.236:6667
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.topic=connect-offsets
offset.storage.replication.factor=1
offset.storage.file.filename=/tmp/connect.offsets
config.storage.topic=connect-configs
config.storage.replication.factor=1
status.storage.topic=connect-status
status.storage.replication.factor=1
offset.flush.interval.ms=10000

and i didnt generate anything in the tmp file root@ambari:/test_stream# hdfs dfs -ls /user/root/topics/+tmp

Any suggestion and respond will be appreciated so much. Thank you.

来源:https://stackoverflow.com/questions/62532940/kafka-connector-to-hdfs-java-io-filenotfoundexception-file-does-not-exist

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!