Output from Dataproc Spark job in Google Cloud Logging

女生的网名这么多〃 提交于 2019-12-01 15:03:41

tl;dr

This is not natively supported now but will be natively supported in a future version of Cloud Dataproc. That said, there is a manual workaround in the interim.

Workaround

Cloud Dataproc clusters use fluentd to collect and forward logs to Cloud Logging. The configuration of fluentd is why you see some logs forwarded and not others. Therefore, the simple workaround (until Cloud Dataproc has support for job details in Cloud Logging) is to modify the flientd configuration. The configuration file for fluentd on a cluster is at:

/etc/google-fluentd/google-fluentd.conf

There are two things to gather additional details which will be easiest:

  1. Add a new fluentd plugin based on your needs
  2. Add a new file to the list of existing files collected (line 56 has the files on my cluster)

Once you edit the configuration, you'll need to restart the google-fluentd service:

/etc/init.d/google-fluentd restart

Finally, depending on your needs, you may or may not need to do this across all nodes on your cluster. Based on your use case, it sounds like you could probably just change your master node and be set.

You can use the dataproc initialization actions for stackdriver for this:

gcloud dataproc clusters create <CLUSTER_NAME> \
    --initialization-actions gs://<GCS_BUCKET>/stackdriver.sh \
    --scopes https://www.googleapis.com/auth/monitoring.write
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!