AWS Glue Crawler Not Creating Table

前端未结

关注

 6  1313

I have a crawler I created in AWS Glue that does not create a table in the Data Catalog after it successfully completes.

The crawler takes roughly 20 seconds to run and

相关标签:

6条回答

忘了有多久

2021-02-18 13:43

check the IAM role associated with the crawler. Most likely you don't have correct permission.

When you create the crawler, if you choose to create an IAM role(the default setting), then it will create a policy for S3 object you specified only. if later you edit the crawler and change the S3 path only. The role associated with the crawler won't have permission to the new S3 path.

0 讨论(0)
发布评论:

提交评论
- 加载中...
悲哀的现实

2021-02-18 13:45

I had the same issue, as advised by others I tried to revise the existing IAM role, to include the new S3 bucket as the resource, but for some reason it did not work. Then I created a completely new role from scratch... this time it worked. Also, one big question I have for AWS is "why this access denied error due to a wrong attached IAM policy does not show up in Cloud watch log??" That makes it difficult to debug.

0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2021-02-18 13:49

You can try excluding some files in the s3 bucket, and those excluded files should appear in the log. I find it helpful in debugging what's happening with the crawler.

0 讨论(0)
发布评论:

提交评论
- 加载中...

你的背包

2021-02-18 13:54

Here is my sample role JSON that allows glue to access s3 and create a table.

{
"Version": "2012-10-17",
"Statement": [
    {
        "Sid": "VisualEditor0",
        "Effect": "Allow",
        "Action": [
            "ec2:DeleteTags",
            "ec2:CreateTags"
        ],
        "Resource": [
            "arn:aws:ec2:*:*:instance/*",
            "arn:aws:ec2:*:*:security-group/*",
            "arn:aws:ec2:*:*:network-interface/*"
        ],
        "Condition": {
            "ForAllValues:StringEquals": {
                "aws:TagKeys": "aws-glue-service-resource"
            }
        }
    },
    {
        "Sid": "VisualEditor1",
        "Effect": "Allow",
        "Action": [
            "iam:GetRole",
            "cloudwatch:PutMetricData",
            "ec2:DeleteNetworkInterface",
            "s3:ListBucket",
            "s3:GetBucketAcl",
            "logs:PutLogEvents",
            "ec2:DescribeVpcAttribute",
            "glue:*",
            "ec2:DescribeSecurityGroups",
            "ec2:CreateNetworkInterface",
            "s3:GetObject",
            "s3:PutObject",
            "logs:CreateLogStream",
            "s3:ListAllMyBuckets",
            "ec2:DescribeNetworkInterfaces",
            "logs:AssociateKmsKey",
            "ec2:DescribeVpcEndpoints",
            "iam:ListRolePolicies",
            "s3:DeleteObject",
            "ec2:DescribeSubnets",
            "iam:GetRolePolicy",
            "s3:GetBucketLocation",
            "ec2:DescribeRouteTables"
        ],
        "Resource": "*"
    },
    {
        "Sid": "VisualEditor2",
        "Effect": "Allow",
        "Action": "s3:CreateBucket",
        "Resource": "arn:aws:s3:::aws-glue-*"
    },
    {
        "Sid": "VisualEditor3",
        "Effect": "Allow",
        "Action": "logs:CreateLogGroup",
        "Resource": "*"
    }
]

}

0 讨论(0)

隐瞒了意图╮

2021-02-18 13:57

If you have existing tables in the target database the crawler may associate your new files with the existing table rather than create a new one.

This occurs when there are similarities in the data or a folder structure that the Glue may interpret as partitioning.

Also on occasion I have needed to refresh the table listing of a database to get new ones to show up.

0 讨论(0)
发布评论:

提交评论
- 加载中...
小鲜肉

2021-02-18 14:07

I had the same problem, the solution was specify the schema of my table.

0 讨论(0)
发布评论:

提交评论
- 加载中...