Building an Azure AI Infrastructure POC with Bicep and Azure DevOps

I have recently started spending more time on Azure AI, AI-103, AI-300, and the broader topic of agentic AI. One thing became clear very quickly: learning AI only through portal demos and playgrounds does not really work for me.

As someone with an Azure infrastructure and DevOps background, I wanted to understand what it would look like to deploy Azure AI services properly. Not just create an AI Services account manually, but build the surrounding deployment pattern: resource groups, naming, Bicep modules, parameter files, validation, What-If, environment promotion, and regional considerations.

So I decided to build a small Azure AI infrastructure POC that feels closer to a real platform deployment than a quick lab exercise. The goal was not to create a perfect production architecture from day one, but to build something realistic enough to learn from: a reusable Azure DevOps pipeline, a modular Bicep structure, tokenised parameter files, and a multi-environment deployment flow for dev, sit, and uat.

This blog post walks through the POC, the structure I used, the pipeline design decisions, and a few useful lessons I hit along the way - including Azure AI regional availability issues and why globally unique service names can break idempotent reruns if the resource group naming changes.

Before going into the pipeline and Bicep structure, the diagram below shows the high-level shape of the POC. Azure DevOps acts as the deployment engine, Bicep defines the infrastructure, and the deployed Azure resources provide the basic building blocks for a future AI-enabled application or agentic workflow.

The architecture is intentionally simple at this stage. The focus is not on building the final AI application yet, but on creating a repeatable platform foundation that can later be extended with application code, RAG logic, or agent orchestration.

High-level architecture diagram showing Azure DevOps pipeline deploying Azure AI Services, Azure AI Search, Application Insights, Storage Account, App Service Plan and Function App into Azure resource groups.

Why I Built This POC

I wanted to learn Azure AI in a way that made sense for an Azure engineer. Reading documentation and watching videos is useful, but I know from experience that real learning happens when you build something, break it, fix it, and then improve the design.

My goal was not to create a fully production-ready AI platform immediately. The goal was to create a small but realistic deployment pattern that could answer questions like:

  • How do I structure Azure AI infrastructure using Bicep?
  • How do I deploy the same workload into multiple environments?
  • How do I separate preview and deploy stages?
  • How do I validate Bicep before deploying?
  • How do I use What-If as a safety mechanism?
  • How do I handle Azure resource group bootstrapping?
  • What happens when a service is not available in the secondary region?

This is also the type of practical experience that helps when preparing for Microsoft AI exams. Certifications are useful, but the concepts become much clearer when you have a real pipeline, real Bicep modules, and real Azure deployment errors to work through.

The Main Idea

The POC deploys a small Azure AI-oriented workload using Azure DevOps and Bicep. The current stack includes:

  • Azure Resource Group
  • Azure AI Services
  • Azure AI Search
  • Application Insights
  • Storage Account
  • App Service Plan
  • Azure Function App

The Function App is not yet the full agent runtime. At this stage, it is a placeholder runtime component that can later be used for orchestration, API logic, retrieval-augmented generation, or agentic workflow integration.

Its role in the current POC is to provide a realistic compute layer around the AI services. In a real application, something needs to receive requests, call Azure AI Services, query Azure AI Search, apply business logic, and return a response. The Function App gives me a simple place to add that logic later without changing the overall infrastructure structure.

This also keeps the design closer to an application platform rather than a collection of disconnected Azure resources.

The important part is the deployment structure. I wanted the infrastructure delivery process to look closer to what I would expect in a proper Azure engineering environment.

Repository Structure

The repository is organised around reusable pipeline building blocks. Instead of putting everything into one large YAML file, the deployment is split into pipelines, stages, jobs, steps, variables, parameters, and Bicep modules.

Infra
├── Bicep
│   ├── main.bicep
│   ├── resource_group.bicep
│   └── Modules
│       ├── ai-search.bicep
│       ├── ai-services.bicep
│       ├── app-insights.bicep
│       ├── app-service-plan.bicep
│       ├── function-app.bicep
│       └── storage-account.bicep
├── Jobs
│   └── job-ai103-infra.yml
├── Parameters
│   ├── main.bicepparam
│   └── resource_group.bicepparam
├── Pipelines
│   └── azure-pipelines.yml
├── Stages
│   └── stage-deploy-ai103-infra-to-all-environments.yml
├── Steps
│   ├── step-create-resource-group.yml
│   ├── step-deploy-bicep.yml
│   ├── step-replace-bicep-tokens.yml
│   ├── step-validate-bicep-deployment.yml
│   ├── step-validate-bicep-tokens.yml
│   └── step-whatif-bicep-deployment.yml
└── Variables
    ├── infra-ai103-dev.yml
    ├── infra-ai103-sit.yml
    ├── infra-ai103-uat.yml
    ├── infra-shared-common.yml
    └── job-infra-variables.yml

This structure is probably more than you need for a simple lab, but that is the point. The POC is not just about deploying one Azure AI resource. It is about learning how a reusable Azure infrastructure pipeline can be designed.

Repository structure diagram showing Infra folder with Bicep, Modules, Jobs, Parameters, Pipelines, Stages, Steps and Variables folders for the Azure AI POC.

Environment and Region Strategy

The pipeline supports three environments: dev, sit, and uat. The current regional pattern uses auea for Australia East and ause for Australia Southeast.

The idea is that the same pipeline can loop through environments and regions. That gives me a good way to practise multi-environment infrastructure delivery without manually duplicating YAML blocks for each target.

Naming Convention

The naming convention I wanted to follow is:

envcode-zone-service-name-01

For example:

dev-poc-aisvc-auea-01
dev-poc-search-auea-01
dev-poc-appi-auea-01
dev-poc-asp-auea-01
dev-poc-func-auea-01

For resource groups, I decided not to use the instance number. The resource group name is cleaner this way:

dev-poc-agenticai-auea-rg
dev-poc-agenticai-ause-rg

Storage accounts are the exception because Azure Storage account names must be globally unique, lowercase, between 3 and 24 characters, and cannot contain hyphens.

Pipeline Design

The pipeline uses a two-stage pattern for each environment: a preview stage and a deploy stage. The preview stage performs the safety checks before any real deployment happens, while the deploy stage performs the actual Bicep deployment after validation and What-If.

For each environment, the same job template is reused across the configured regions. In my case, the pipeline currently loops through dev, sit, and uat, and then through the regional codes auea and ause.

The end result in Azure DevOps looks similar to this:

Preview Infrastructure [dev]
  Preview Infrastructure [dev-auea]
  Preview Infrastructure [dev-ause]

Deploy Infrastructure [dev]
  Deploy Infrastructure [dev-auea]
  Deploy Infrastructure [dev-ause]

This makes the run easier to read. Instead of one large deployment job, each environment and region combination is visible in the pipeline run, which makes troubleshooting much simpler.

The diagram below shows how the YAML template structure expands into actual Azure DevOps stages and jobs. The root pipeline calls the stage template, the stage template creates preview and deploy stages for each environment, and each stage then creates region-specific jobs.

In this example, the pipeline creates separate preview and deploy jobs for dev, sit, and uat. For the dev environment, the diagram also shows how the jobs split into auea and ause regional deployments.

Azure DevOps pipeline stage diagram showing preview and deploy stages for dev, sit and uat environments with regional jobs for Australia East and Australia Southeast.

Pipeline Control Flow

The diagram below shows the control flow behind the pipeline. Azure DevOps loads the shared and environment-specific variables, loops through each environment, then loops through the configured regions for that environment.

The same job template is used for both preview and deploy modes. The difference is controlled by the DeploymentMode parameter. In preview mode, the pipeline replaces tokens, validates the parameter files, creates the resource group if it is missing, validates the Bicep deployment, and runs What-If. In deploy mode, it skips the resource group bootstrap and performs the final deployment after validation and What-If.

Pipeline control flow diagram showing Azure DevOps loading variables, looping through environments and regions, then running different preview and deploy paths with token replacement, validation, resource group creation, What-If and deployment.

Preview Stage Behaviour

The preview stage performs the following actions:

checkout repository
replace tokens
validate unresolved tokens
create resource group if missing
validate Bicep deployment
run What-If

The important point is that the resource group is created only in the preview stage, and only if it does not already exist. This is useful because resource group creation is a bootstrap activity. Once the resource group exists, the deploy stage should not need to recreate it.

Preview stage flow showing token replacement, token validation, resource group existence check, Bicep validation and What-If operation before deployment.

Deploy Stage Behaviour

The deploy stage performs the real resource deployment:

checkout repository
replace tokens
validate unresolved tokens
validate Bicep deployment
deploy Bicep resources

The deploy stage does not create the resource group. That is intentional. It keeps the bootstrap logic separate from the main deployment logic.

Why I Use Tokenised Bicep Parameter Files

The parameter files use token placeholders instead of hardcoded values. This allows the same .bicepparam files to be reused across environments and regions, while the actual values are supplied by Azure DevOps variable templates.

For example, the Bicep parameter file contains placeholders like this:

param location = '#{DeployLocation}#'
param resourceGroupName = '#{ResourceGroupName}#'
param aiServicesName = '#{AiServicesName}#'
param searchServiceName = '#{SearchServiceName}#'
param functionAppName = '#{FunctionAppName}#'

The matching values are built in the pipeline variable template. For example, the job-level variables define the environment code, region code, deployment location, and resource names:

variables:
  EnvCode: ${{ parameters.EnvCode }}
  RegionCode: ${{ parameters.RegionCode }}
  DeployLocation: ${{ parameters.Location }}

  ZoneCode: poc
  InstanceNumber: '01'

  ResourceGroupName: $(EnvCode)-$(ZoneCode)-agenticai-$(RegionCode)-rg
  AiServicesName: $(EnvCode)-$(ZoneCode)-aisvc-$(RegionCode)-$(InstanceNumber)
  SearchServiceName: $(EnvCode)-$(ZoneCode)-search-$(RegionCode)-$(InstanceNumber)
  FunctionAppName: $(EnvCode)-$(ZoneCode)-func-$(RegionCode)-$(InstanceNumber)

When the pipeline runs for dev in auea, those variables resolve to values similar to this:

DeployLocation: australiaeast
ResourceGroupName: dev-poc-agenticai-auea-rg
AiServicesName: dev-poc-aisvc-auea-01
SearchServiceName: dev-poc-search-auea-01
FunctionAppName: dev-poc-func-auea-01

That means the the single Bicep parameter file stays generic, while the pipeline controls the environment-specific and region-specific values. The same main.bicepparam file can be used for dev-auea, dev-ause, sit-auea, uat-auea, and so on.

After token replacement, the pipeline runs a validation step to make sure no unresolved tokens remain. This is a small but important guardrail. If a value such as #{ResourceGroupName}# is still present after token replacement, the pipeline fails early before the deployment reaches Azure Resource Manager.

Bicep Scope Design

The POC uses two deployment scopes:

  • Subscription scope for resource group creation
  • Resource group scope for the actual workload resources

The resource group Bicep file uses subscription scope and is deployed using New-AzSubscriptionDeployment. The main workload Bicep file uses resource group scope and is deployed using New-AzResourceGroupDeployment.

Bicep scope diagram showing subscription-scope deployment creating the resource group and resource-group scoped deployment creating Azure AI workload resources.

Why Validate Before What-If and Deploy?

The pipeline has a dedicated Bicep validation step before What-If and deployment. It runs:

Test-AzResourceGroupDeployment

The purpose of this step is to catch deployment issues before the pipeline reaches What-If or the real deployment stage. It helps identify problems such as invalid syntax, missing parameters, wrong parameter types, invalid resource names, unsupported locations, invalid SKU values, and Azure Resource Manager validation errors.

One of the useful examples from this POC was a validation run for the dev-ause job. The Bicep file compiled successfully, the parameter file was generated correctly, and the pipeline authenticated to the right Azure subscription. However, Azure returned a platform-level validation error:

Code    : LocationNotAvailableForResourceType
Message : The provided location 'australiasoutheast' is not available for resource type
          'Microsoft.CognitiveServices/accounts'.

This was useful because it showed that the issue was not the YAML structure, token replacement, or Bicep syntax. The pipeline was correctly trying to deploy the configured resources into australiasoutheast. The actual problem was that the Azure AI Services resource type was not available in that region.

Azure DevOps validation log showing LocationNotAvailableForResourceType error for Microsoft.CognitiveServices accounts in australiasoutheast during Bicep validation.

One important lesson here is that validation must be treated as a gate, not just as log output. In my first version, the validation result was printed in the pipeline logs, but the task still completed successfully. That meant the pipeline could continue to What-If or deployment even though Azure had already reported a validation issue.

As expected, the same issue appeared again later during the actual deployment step. This time the Deploy Bicep resources [dev-ause] task failed because Azure Resource Manager rejected the deployment for the same regional availability reason:

Code=LocationNotAvailableForResourceType
Message=The provided location 'australiasoutheast' is not available for resource type
'Microsoft.CognitiveServices/accounts'.

Azure DevOps deployment log showing failed Deploy Bicep resources step for dev-ause due to LocationNotAvailableForResourceType for Microsoft.CognitiveServices accounts in australiasoutheast.

This made the issue much clearer. The validation step had already shown the problem, but because the script did not hard-fail, the pipeline was still able to continue. For a production-style pipeline, that is not the behaviour I want. If validation finds a real Azure Resource Manager error, the job should stop there and never reach What-If or the deployment step.

To make the validation step behave properly, the script should capture the output from Test-AzResourceGroupDeployment and throw an error if any validation errors are returned:

$validationResult = Test-AzResourceGroupDeployment `
  -ResourceGroupName "$(ResourceGroupName)" `
  -TemplateFile $TemplateFile `
  -TemplateParameterFile $ParamsFile `
  -Verbose

if ($null -ne $validationResult) {
  $validationJson = $validationResult | ConvertTo-Json -Depth 20
  throw "Bicep validation failed. Validation result: $validationJson"
}

With that change, the validation step becomes a hard gate. If Azure reports an issue such as LocationNotAvailableForResourceType, the pipeline fails during validation and never reaches the What-If or deployment steps for that job.

This is exactly why I like having a separate validation step. It gives the pipeline a clear checkpoint before any real deployment happens, and it makes Azure platform constraints visible early.

After validation, the pipeline runs What-If:

New-AzResourceGroupDeployment -WhatIf

What-If shows what Azure expects to create, modify, or remove. This is especially useful when learning, because it helps you understand the impact before making the actual change.

The First Real Lesson: Not Every Azure Service Exists in Every Region

One of the useful lessons from this POC was that multi-region deployment does not mean blindly deploying every resource into every configured region.

In my pipeline, I had configured both Australia East and Australia Southeast:

auea = australiaeast
ause = australiasoutheast

That worked as a general regional deployment pattern, but Azure AI Services introduced a constraint. The application stack could be deployed regionally, but the AI Services account could not simply be deployed into australiasoutheast in the same way.

The fix was to separate the general deployment location from the Azure AI Services location. In the job variables, I introduced a dedicated variable:

AiServicesLocation: australiaeast

Then, in the Bicep parameter file, I passed it separately from the normal deployment location:

param location = '#{DeployLocation}#'
param aiServicesLocation = '#{AiServicesLocation}#'

And in the main Bicep file, the Azure AI Services module used the AI-specific location instead of the regional deployment location:

module aiServices 'Modules/ai-services.bicep' = {
  name: '${resourcePrefix}-ai-services'
  params: {
    name: aiServicesName
    location: aiServicesLocation
    skuName: aiServicesSku
    tags: commonTags
  }
}

This allowed the regional stack to continue deploying into auea and ause, while Azure AI Services was deployed into a supported region.

Disaster Recovery Service Placement Pattern

Architecturally, this became an important disaster recovery and platform design lesson. A secondary region does not always mean an identical copy of every resource. Some services are regional, some are global, some have limited regional availability, and some require a different failover or consumption pattern.

For this POC, the practical design became:

dev-auea stack:
  deploy regional resources into australiaeast
  deploy Azure AI Services into australiaeast

dev-ause stack:
  deploy regional resources into australiasoutheast
  deploy or reference Azure AI Services in australiaeast

To make this clearer, I created a more detailed disaster recovery service placement diagram. The point of the diagram is not to present a complete production DR architecture, but to show the principle I wanted to capture in the POC: regional runtime components can be deployed into both regions, while services with limited regional availability may need to be deployed in a supported region and consumed by both stacks.

In a more complete design, a global entry point such as Azure Front Door could sit in front of the regional Function Apps and route users to the appropriate healthy regional endpoint. I did not build that part in this POC yet, but it is useful to include in the diagram because it shows how the regional stacks could eventually be exposed and failed over.

In this pattern, the Function App, Storage Account, Azure AI Search, Application Insights, and App Service Plan can follow the regional stack. Azure AI Services is treated separately because it cannot be assumed to exist in every target region. The secondary stack can still run in australiasoutheast, but it may need to call an AI Services endpoint hosted in australiaeast.

That is not a full production disaster recovery design yet, but it is a realistic learning point. The platform needs to understand which services can be deployed regionally, which services need special handling, and which application components need configuration that points to the correct regional or shared endpoint.

Disaster recovery service placement diagram showing Azure Front Door routing to Australia East and Australia Southeast Function Apps, with regional Storage Accounts, Azure AI Search, Application Insights and App Service Plans, while Azure AI Services is deployed in Australia East as a supported region and consumed by both stacks.

Successful Dev Pipeline Run

After updating the Bicep template to separate the general deployment location from the Azure AI Services location, I reran the pipeline for the dev environment. This time the Preview [dev] Infra and Deploy [dev] Infra stages completed successfully.

This was an important milestone because it confirmed that the pipeline flow was working end to end for the first environment. The preview stage completed the checks, and the deploy stage then deployed the infrastructure for both configured regions.

The screenshot below shows the successful pipeline run. The dev preview and deploy stages completed, while the later sit and uat stages were skipped because I only needed to validate the pattern in dev first.

Azure DevOps pipeline run showing successful Preview dev and Deploy dev infrastructure stages, with sit and uat stages skipped.

At this point, the POC moved from pipeline theory into a working deployment baseline. I had a reusable YAML structure, token replacement, validation, preview/deploy separation, and regional deployment logic all working together for the dev environment.

Successful Dev Deployment Across Both Regions

After fixing the Azure AI Services regional availability issue, I reran the pipeline for the dev environment. This time the deployment completed successfully and created the expected resources across both configured regions.

This was an important checkpoint for the POC. It confirmed that the environment loop, regional loop, token replacement, resource group bootstrap, naming convention, and Bicep modules were all working together as expected.

The screenshot below shows the deployed dev resources after the successful run. At this point, the POC was no longer just validating templates or showing pipeline structure. It had created a working Azure AI infrastructure baseline that I could continue to build on.

Azure portal screenshot showing successfully deployed Azure AI infrastructure resources for the dev environment across Australia East and Australia Southeast regions.

It is still only a learning environment, but this successful deployment gave me a practical foundation for the next stage of the POC: adding Key Vault, managed identity, private networking, RBAC, Azure AI Foundry, and eventually some application or agent orchestration logic.

The Second Lesson: Global Names Can Break Idempotency If the Resource Group Changes

Another useful lesson came from rerunning the pipeline after changing the resource group naming convention.

Some Azure resource names are globally unique. This includes Azure AI Services custom subdomain names, Azure AI Search service names, Function App hostnames, and Storage account names.

If those resources already exist in one resource group, and the pipeline starts targeting a different resource group while using the same global names, Azure treats it as a request to create new resources with names that are already taken.

That leads to errors like:

CustomDomainInUse
ServiceNameUnavailable

The lesson is simple: if you want idempotent reruns, keep the target resource group and resource names stable. If you intentionally change the resource group naming convention, you may also need to delete the old resources, purge soft-deleted resources, or introduce a unique suffix for globally unique resources.

How the Function App Fits In

The Function App is currently a runtime placeholder. It receives configuration values such as the Azure AI Services endpoint, Azure AI Search service name, Application Insights connection string, and Storage connection string.

In a later version of the POC, this Function App could become the place where orchestration logic lives. It could accept API requests, call Azure AI Services, query Azure AI Search, implement a simple RAG workflow, or act as a lightweight agentic orchestration layer.

What This POC Does Not Do Yet

This is still a learning POC. It is not a complete production AI platform.

  • No private endpoints yet
  • No Key Vault integration yet
  • Function App currently uses a storage account key connection string
  • No managed identity based storage connection yet
  • No Azure AI Foundry project deployment yet
  • No APIM or Azure Front Door routing
  • No automated failover
  • No RAG index creation yet
  • No Function App code deployment yet

Future Improvements

  • Add Key Vault for secrets and configuration
  • Use managed identity for Function App storage access
  • Add RBAC modules
  • Add private endpoints for Storage, AI Search, and AI Services where supported
  • Add diagnostic settings to Log Analytics
  • Add Azure AI Foundry resources
  • Add AI Search index and data source provisioning
  • Add Function App code deployment
  • Add smoke tests after deployment
  • Add cost-control tags and budgets
  • Add approval gates before SIT and UAT deployments

Why This Was Useful for AI-103 and AI-300 Learning

This exercise helped connect several areas that are often learned separately: Azure AI services, Azure AI Search, serverless runtime design, Bicep modules, Azure DevOps templates, multi-environment deployment, What-If and validation, and regional availability thinking.

For exam preparation, this is useful because the concepts become practical. You are not just memorising that Azure AI Search exists. You are deploying it, naming it, validating it, handling regional constraints, and thinking about how an application would consume it.

For real-world engineering, this is even more useful. AI workloads still need proper platform foundations. They still need governance, repeatable deployment, monitoring, secure configuration, and lifecycle management.

Final Thoughts

The biggest lesson from this POC is that learning Azure AI properly is not just about models and prompts. For an Azure engineer, it is also about how the platform is built around those models.

A small AI demo can be created quickly in the portal. But a repeatable AI platform requires more thought. Where do resources live? How are they named? How are they deployed? How are changes validated? How do environments progress? How do we handle unsupported regions? How do we keep the design simple enough to learn but structured enough to grow?

This POC is not the final answer. It is a starting point. But it gives me a practical foundation for learning Azure AI, AI-103, AI-300, and agentic AI infrastructure patterns in a way that fits my Azure DevOps and infrastructure background.

And that is exactly what I wanted from it.