Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RoleAssignmentNotFound with azurerm_role_assignment #9379

Closed
fraenkel opened this issue Nov 18, 2020 · 10 comments · Fixed by #10134
Closed

RoleAssignmentNotFound with azurerm_role_assignment #9379

fraenkel opened this issue Nov 18, 2020 · 10 comments · Fixed by #10134

Comments

@fraenkel
Copy link
Contributor

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.13.3

  • provider registry.terraform.io/hashicorp/azurerm v2.32.0
  • provider registry.terraform.io/hashicorp/external v2.0.0

Affected Resource(s)

  • azurerm_role_assigment

Terraform Configuration Files

resource "azurerm_user_assigned_identity" "tc" {
  name                = "tc"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
  tags                = local.default_tags
}

resource "azurerm_role_assignment" "tc-acr" {
  scope                            = data.azurerm_resources.acr.resources[0].id
  role_definition_name             = "AcrPull"
  principal_id                     = azurerm_user_assigned_identity.tc.principal_id
  skip_service_principal_aad_check = true
}

resource "azurerm_role_assignment" "tc-privatedns" {
  scope                            = azurerm_resource_group.rg.id
  role_definition_name             = "Private DNS Zone Contributor"
  principal_id                     = azurerm_user_assigned_identity.tc.principal_id
  skip_service_principal_aad_check = true
}

data "azurerm_resource_group" "dns" {
  name = "dns"
}

resource "azurerm_role_assignment" "tc-dns" {
  scope                            = data.azurerm_resource_group.dns.id
  role_definition_name             = "DNS Zone Contributor"
  principal_id                     = azurerm_user_assigned_identity.tc.principal_id
  skip_service_principal_aad_check = true
}

resource "azurerm_role_assignment" "tc-vm" {
  scope                            = azurerm_resource_group.rg.id
  role_definition_name             = "Virtual Machine Contributor"
  principal_id                     = azurerm_user_assigned_identity.tc.principal_id
  skip_service_principal_aad_check = true
}

resource "azurerm_role_assignment" "tc-monitoring" {
  scope                            = azurerm_resource_group.rg.id
  role_definition_name             = "Monitoring Reader"
  principal_id                     = azurerm_user_assigned_identity.traffic-control.principal_id
  skip_service_principal_aad_check = true
}

resource "azurerm_role_assignment" "tc-aks" {
  scope                            = azurerm_resource_group.rg.id
  role_definition_name             = "Azure Kubernetes Service Cluster User Role"
  principal_id                     = azurerm_user_assigned_identity.traffic-control.principal_id
  skip_service_principal_aad_check = true
}

resource "azurerm_linux_virtual_machine_scale_set" "tc" {
  name                = "tc"
  identity {
    type         = "UserAssigned"
    identity_ids = [azurerm_user_assigned_identity.tc.id]
  }
...

  depends_on = [
    azurerm_role_assignment.tc-acr,
    azurerm_role_assignment.tc-privatedns,
    azurerm_role_assignment.tc-dns,
    azurerm_role_assignment.tc-vm,
    azurerm_role_assignment.tc-monitoring,
    azurerm_role_assignment.tc-aks
  ]
}

resource "azurerm_linux_virtual_machine_scale_set" "te" {
  for_each = var.availability_zones

  name                = "te-${each.value}"

  identity {
    type         = "UserAssigned"
    identity_ids = [azurerm_user_assigned_identity.te.id]
  }


  depends_on = [
    azurerm_role_assignment.te-acr
  ]
}

Debug Output

Still creating... [2m0s elapsed]
azurerm_linux_virtual_machine_scale_set.te["3"]:
Still creating... [2m10s elapsed]
azurerm_linux_virtual_machine_scale_set.traffic-envoy["3"]: Creation complete after 2m15s [id=/subscriptions/xxxx/resourceGroups/yyyy/providers/Microsoft.Compute/virtualMachineScaleSets/te-3]

Error: authorization.RoleAssignmentsClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="RoleAssignmentNotFound" Message="The role assignment '920ffd05-df3c-308b-2e01-c8c58481998e' is not found."
  on tc.tf line 8, in resource "azurerm_role_assignment" "tc-acr":
  8: resource "azurerm_role_assignment" "tc-acr" {
Error: authorization.RoleAssignmentsClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="RoleAssignmentNotFound" Message="The role assignment '1fda1763-4e9f-2c9f-b0a3-a2581b68e457' is not found."
  on tc.tf line 26, in resource "azurerm_role_assignment" "tc-dns":
  26: resource \\\"azurerm_role_assignment\\\" \\\"tc-dns\\\" {
Error: authorization.RoleAssignmentsClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\\\"RoleAssignmentNotFound\\\" Message=\\\"The role assignment '8f820931-964c-84bf-ecc7-f3c2385fb6a9' is not found."
  on tc.tf line 47, in resource "azurerm_role_assignment" "tc-aks":
  47: resource \\\"azurerm_role_assignment\\\" "tc-aks" 

Expected Behaviour

Success

Actual Behaviour

Failure

Important Factoids

Before I placed the dependency between the VMSS and role assignments, the failure between tc role assignment would occur on the tc VMSS. Once the dependency was added it shifted to the te VMSS.

Not all role assignments fail, its usually 2 or 3, guess its timing related.

  • #0000
@fraenkel
Copy link
Contributor Author

I added a dependency between the tc role assignments and tc, and the error still occurs. Its almost like the number of role assignments created consecutively causes the issue.

@fraenkel
Copy link
Contributor Author

Chaining the dependencies didn't resolve the RoleAssignmentNotFound. It did however limit the failure to 1 role assignment rather than 3.

My next attempt is to create a role definition from the built in roles and see if that has a higher success rate.

@fraenkel
Copy link
Contributor Author

fraenkel commented Dec 2, 2020

I switched to a pre-configured role definition, and the same problem occurs.
If I attempt to apply after the failure, it fails with:

Error: authorization.RoleAssignmentsClient#Create: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="RoleAssignmentExists" Message="The role assignment already exists."

@AustinIvey
Copy link

This is also an issue for me as well. We're using role assignments the same way on a Keyvault and the state isn't making it to the state file after the first run, causing an error until the resource is imported.

@fraenkel
Copy link
Contributor Author

fraenkel commented Dec 4, 2020

We have a trace of what is going on:

2020-12-04T15:45:47.017Z [DEBUG] plugin.terraform-provider-azurerm_v2.32.0_x5: PUT //subscriptions/xxx/resourceGroups/yyy/providers/Microsoft.ContainerRegistry/registries/zzz/providers/Microsoft.Authorization/roleAssignments/e1bf9d48-f1bf-0d95-ff44-0ccd37eda29e?api-version=2018-09-01-preview HTTP/1.1
2020-12-04T15:45:47.429Z [DEBUG] plugin.terraform-provider-azurerm_v2.32.0_x5: [DEBUG] AzureRM Response for https://management.azure.com//subscriptions/xxx/resourceGroups/yyy/providers/Microsoft.ContainerRegistry/registries/zzz/providers/Microsoft.Authorization/roleAssignments/e1bf9d48-f1bf-0d95-ff44-0ccd37eda29e?api-version=2018-09-01-preview: 
2020-12-04T15:45:47.429Z [DEBUG] plugin.terraform-provider-azurerm_v2.32.0_x5: HTTP/2.0 201 Created

2020-12-04T15:45:47.430Z [DEBUG] plugin.terraform-provider-azurerm_v2.32.0_x5: GET //subscriptions/xxx/resourceGroups/yyy/providers/Microsoft.ContainerRegistry/registries/zzz/providers/Microsoft.Authorization/roleAssignments/e1bf9d48-f1bf-0d95-ff44-0ccd37eda29e?api-version=2018-09-01-preview HTTP/1.1
2020-12-04T15:45:47.503Z [DEBUG] plugin.terraform-provider-azurerm_v2.32.0_x5: [DEBUG] AzureRM Response for https://management.azure.com//subscriptions/xxx/resourceGroups/yyy/providers/Microsoft.ContainerRegistry/registries/zzz/providers/Microsoft.Authorization/roleAssignments/e1bf9d48-f1bf-0d95-ff44-0ccd37eda29e?api-version=2018-09-01-preview: 
2020-12-04T15:45:47.503Z [DEBUG] plugin.terraform-provider-azurerm_v2.32.0_x5: HTTP/2.0 404 Not Found

@gandelman-a
Copy link

gandelman-a commented Dec 4, 2020

We've hit this as well. We run our TF deployment pipelines on-prem and in Azure DevOps. The bug hits us only in pipelines running on ADO. I've been unable to reproduce it locally but just ran my test case on an ADO agent node and was able to reproduce on the first try.

My suspicion is that running TF in close proximity to the Azure API results in much quicker API response times, and the provider hits a race condition here. The initial create request returns but the operation is not atomic and a quick Get on the resource 404s, causing the provider to bail. I believe wrapping the second request in a retry will fix the issue. I'm in the process of testing now.

Test case:

  1. Create some resources and a bunch of MSIs:
locals {
  num_msi = 10
  slug    = "xyz1234"
}

provider "azurerm" {
  version = "=2.38.0"
  features {}
}

data "azurerm_subscription" "current" {}

# Create a resource group
resource "azurerm_resource_group" "rg" {
  name     = "roletestrg${local.slug}"
  location = "eastus"
}

resource "azurerm_storage_account" "this" {
  name                      = "rbactestsa${local.slug}"
  resource_group_name       = azurerm_resource_group.rg.name
  location                  = azurerm_resource_group.rg.location
  account_kind              = "StorageV2"
  account_tier              = "Standard"
  account_replication_type  = "LRS"
  enable_https_traffic_only = true
}

resource "azurerm_key_vault" "this" {
  name                = "rbactestkv${local.slug}"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
  sku_name            = "premium"
  tenant_id           = data.azurerm_subscription.current.tenant_id
}

resource "azurerm_user_assigned_identity" "service_msi" {
  count               = local.num_msi
  name                = "testmsi${count.index}"
  resource_group_name = azurerm_resource_group.rg.name
  location            = azurerm_resource_group.rg.location
}

Create a bunch of role assignments for MSI on resources:

locals {
  slug = "xyz1234"
  num_msi = 10
}

provider "azurerm" {
  version = "=2.38.0"
  features {}
}

data "azurerm_subscription" "current" {}

# Create a resource group
data "azurerm_resource_group" "rg" {
  name = "roletestrg${local.slug}"
}

data "azurerm_storage_account" "this" {
  name                = "rbactestsa${local.slug}"
  resource_group_name = data.azurerm_resource_group.rg.name
}

data "azurerm_key_vault" "this" {
  name                = "rbactestkv${local.slug}"
  resource_group_name = data.azurerm_resource_group.rg.name
}

data "azurerm_user_assigned_identity" "service_msi" {
  count               = local.num_msi
  name                = "testmsi${count.index}"
  resource_group_name = data.azurerm_resource_group.rg.name
}

resource "azurerm_role_assignment" "reader" {
  for_each             = toset(data.azurerm_user_assigned_identity.service_msi[*].principal_id)
  scope                = data.azurerm_storage_account.this.id
  role_definition_name = "Reader"
  principal_id         = each.value
}

resource "azurerm_role_assignment" "kv_reader" {
  for_each             = toset(data.azurerm_user_assigned_identity.service_msi[*].principal_id)
  scope                = data.azurerm_key_vault.this.id
  role_definition_name = "Reader"
  principal_id         = each.value
}

@dlamotte
Copy link

dlamotte commented Dec 5, 2020

I have submitted a fix for this here #9698

@gandelman-a
Copy link

@dlamotte Tested the fix locally and it solves the issue for me. Thanks much.

@ghost
Copy link

ghost commented Jan 14, 2021

This has been released in version 2.43.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 2.43.0"
}
# ... other configuration ...

@ghost
Copy link

ghost commented Feb 11, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Feb 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
6 participants