Ability to easily provide custom error messages/suggestions/docs for errors.#6700
Ability to easily provide custom error messages/suggestions/docs for errors.#6700wbreza wants to merge 3 commits intoAzure:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a pattern-based error suggestion system that transforms cryptic Azure errors into user-friendly, actionable guidance. When users encounter well-known errors like quota limits or authentication failures, azd now displays a clear message, actionable suggestion, documentation link, and the raw error (de-emphasized in grey).
Changes:
- Adds YAML-based error pattern configuration in
resources/error_suggestions.yamlwith 279 lines covering quota, authentication, deployment, network, container, and tool-related errors - Implements pattern matching engine with regex support and caching in
pkg/errorhandler/ - Enhances
ErrorWithSuggestiontype with optionalMessageandDocUrlfields (backward compatible) - Integrates error suggestion service into error middleware to automatically wrap matching errors
- Updates UX middleware to display enhanced error format with user-friendly messaging
- Adds comprehensive test coverage and documentation
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/resources/error_suggestions.yaml | New YAML configuration defining 20+ error patterns with user-friendly messages and actionable suggestions |
| cli/azd/resources/resources.go | Embeds error_suggestions.yaml file into the binary |
| cli/azd/pkg/errorhandler/types.go | Defines types for error suggestion rules and matching results |
| cli/azd/pkg/errorhandler/matcher.go | Implements pattern matching engine with regex caching |
| cli/azd/pkg/errorhandler/service.go | Service that loads YAML config and matches errors against patterns |
| cli/azd/pkg/errorhandler/matcher_test.go | Comprehensive tests for pattern matching and service |
| cli/azd/pkg/output/ux/error_with_suggestion.go | UX component for displaying enhanced error format |
| cli/azd/pkg/output/ux/error_with_suggestion_test.go | Tests for error display formatting |
| cli/azd/internal/errors.go | Enhanced ErrorWithSuggestion type with Message and DocUrl fields |
| cli/azd/cmd/middleware/error.go | Integrates error suggestion service into error middleware |
| cli/azd/cmd/middleware/error_test.go | Tests for pattern matching integration and backward compatibility |
| cli/azd/cmd/middleware/ux.go | Updates UX middleware to use enhanced error display |
| cli/azd/cmd/container.go | Registers ErrorSuggestionService in DI container |
| cli/azd/docs/error-suggestions.md | Comprehensive documentation on using and extending the error suggestion system |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| result, err := middleware.Run(*mockContext.Context, nextFn) | ||
|
|
||
| // Should return error without AI intervention in no-prompt mode |
There was a problem hiding this comment.
Misleading comment: The comment states "Should return error without AI intervention in no-prompt mode" but this test is actually checking the scenario where the LLM alpha feature is disabled (NoPrompt is false in line 64). The comment should clarify that this tests the case where the LLM feature is disabled, not no-prompt mode.
| // Should return error without AI intervention in no-prompt mode | |
| // Should return error without AI intervention when the LLM alpha feature is disabled |
There was a problem hiding this comment.
I don't think this change copilot is suggesting is correct. I think it's conflating no-prompt mode for agent interaction with the alpha agentic mode. @wbreza pls confirm
There was a problem hiding this comment.
Fixed — updated the comment to: Should return error without AI intervention when LLM alpha feature is not enabled
| type PatternMatcher struct { | ||
| // compiledPatterns caches compiled regex patterns for performance | ||
| compiledPatterns map[string]*regexp.Regexp |
There was a problem hiding this comment.
Potential concurrency issue: The compiledPatterns map is not thread-safe. If the PatternMatcher is shared across goroutines (which might happen since ErrorSuggestionService is registered as a singleton in the DI container), concurrent access to the map could lead to race conditions. Consider using sync.RWMutex to protect map access, or use sync.Map for concurrent access patterns.
There was a problem hiding this comment.
Good catch — added sync.RWMutex to protect the compiled regex cache. Read lock for cache hits, write lock only when compiling a new pattern.
| func TestErrorSuggestionService_FirstMatchWins(t *testing.T) { | ||
| service := NewErrorSuggestionService() | ||
|
|
||
| // An error that could match multiple patterns should return the first match |
There was a problem hiding this comment.
How can we prevent that in the futrure, we don't create an entry for an error that overrule others and unintentionally changes all
There was a problem hiding this comment.
Do you mean a new error pattern?
There was a problem hiding this comment.
I think we just need to be conscious and not add error patterns that are too generic.
There was a problem hiding this comment.
Yes, a new pattern.
The current implementation resolves multiple patterns with a match by taking the first match.
This means that the order will decide the importance of each pattern. This could lead to a future scenario where it is hard to pick the right place for a new pattern if it could re-wire the existing pattern after it.
I guess we can cross that bridge when we get there 😜
There was a problem hiding this comment.
The YAML file now has a header comment explaining the specificity-first ordering convention, and the docs cover it as well. For now the ordering is manageable — we can add validation tooling if the rule set grows significantly.
There was a problem hiding this comment.
Could we consider making ErrorWithSuggestion usable from extensions as well? Currently it's in internal which makes it hard to reuse outside of core. I saw the azd x extension uses its own UserFriendlyError but would be great to have something shared
We're facing similar challenges in the AI Agents extension (see #6683)
There was a problem hiding this comment.
So i think the ask is that we want extensions themselves to be able to participate in the error handling flow and maybe have their own associated file for looking for error patterns?
There was a problem hiding this comment.
Yes, but depending on the complexity maybe we ship this and then evolve/add that.
There was a problem hiding this comment.
The file-based error suggestion flow would be nice to have but I was thinking more straightforward/immediate: extensions being able to use the ErrorWithSuggestion type and the UX rendering
There was a problem hiding this comment.
Agreed — making ErrorWithSuggestion and the UX rendering available to extensions is a great next step. For this PR we'll ship the core pipeline and iterate from there.
There was a problem hiding this comment.
Good news — ErrorWithSuggestion now lives in pkg/errorhandler (the internal package just has a type alias). Extensions can import and use it directly. The UX rendering component is in pkg/output/ux/error_with_suggestion.go which is also importable.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
I'm very excited about this change |
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
There was a problem hiding this comment.
So i think the ask is that we want extensions themselves to be able to participate in the error handling flow and maybe have their own associated file for looking for error patterns?
| func TestErrorSuggestionService_FirstMatchWins(t *testing.T) { | ||
| service := NewErrorSuggestionService() | ||
|
|
||
| // An error that could match multiple patterns should return the first match |
There was a problem hiding this comment.
I think we just need to be conscious and not add error patterns that are too generic.
| ### Rule Evaluation | ||
|
|
||
| - **First match wins**: Rules are evaluated in order from top to bottom | ||
| - **Order matters**: Place more specific patterns before general ones | ||
| - **Multiple patterns per rule**: If any pattern in a rule matches, that rule wins |
There was a problem hiding this comment.
Regular expressions are not rich enough to solve this problem.
- We're completely skipping HTTP status codes and
x-ms-error-codes which are the two most important details for any Azure error. - We're completely ignoring the structure. Being able to discern between the top-level error response and nested details matters a lot in some situations.
- Simple regexes aren't enough to distinguish context in many situations (i.e., you'd handle "quota exceeded" subscription limits very differently than you would AI model quota)
- This doesn't compose/scale well. "First rule wins" text matching isn't going to let us easily customize the error experience for many services each loaded in their own extensions.
- I can't run arbitrary code to compute my suggestion (i.e., I can't determine regions where your active subscription still has quota).
I think we need to add a layer underneath what you have here. We should create an interface for matching errors and computing suggested responses that mirrors an HTTP middleware pipeline. We could implement this regex matcher with it, the agentic matcher with it, and eventually allow other extensions to register their own error handlers. We could also explore allowing all matches to flow through and either let users flip between them or asking an agent which one makes the most sense.
There was a problem hiding this comment.
Great feedback. This PR now includes typed error matching via reflection (errorType + properties), deep ARM error tree traversal via multi-unwrap, and named handlers (ResourceNotAvailableHandler queries ARM for available regions). The pipeline architecture with the ErrorHandler interface is designed to be extended — adding HTTP status code matching, extension-registered handlers, and richer composition are natural next steps on top of this foundation.
There was a problem hiding this comment.
Updated with a more detailed response now that the implementation has evolved significantly:
We're completely skipping HTTP status codes and x-ms-error-code
We now match azcore.ResponseError by type via reflection, with ErrorCode property matching. For example, LocationNotAvailableForResourceType is caught both as a ResponseError (pre-polling validation failure) and as a DeploymentErrorLine (nested in deployment errors). Adding StatusCode matching would be straightforward since resolvePropertyPath already handles any exported field.
We're completely ignoring the structure
The pipeline now does DFS traversal through Unwrap() []error on ARM error trees, matching DeploymentErrorLine nodes by their Code property. This finds error codes buried 3-4 levels deep (e.g., DeploymentFailed > ResourceDeploymentFailure > FlagMustBeSetForRestore). Type + property matching happens together during traversal so we match the right node, not just the first type match.
Simple regexes aren't enough to distinguish context
Agreed — that's why we have typed error matching + properties as the primary strategy for structured errors. Text patterns are a fallback for unstructured errors. Rules can combine both: errorType: "DeploymentErrorLine" + properties: {Code: "Conflict"} + patterns: ["(?i)soft.?delete"] to distinguish soft-delete conflicts from generic ones.
This doesn't compose/scale well
The ErrorHandler interface is the extension point here. Named handlers are resolved from the IoC container and receive the full matching rule. This is designed so extensions can register their own handlers. Evolving to let extensions also register their own rules is a natural next step.
I can't run arbitrary code to compute my suggestion
Named handlers solve this. The built-in ResourceNotAvailableHandler queries the ARM Providers API for available regions, reads from the azd environment, and builds a dynamic suggestion. Any handler can run arbitrary code — it receives the context, error, and matching rule.
|
Hi @@wbreza. Thank you for your interest in helping to improve the Azure Developer CLI experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days. |
|
Hi @@wbreza. Thank you for your contribution. Since there hasn't been recent engagement, we're going to close this out. Feel free to respond with a comment containing "/reopen" if you'd like to continue working on these changes. Please be sure to use the command to reopen or remove the "no-recent-activity" label; otherwise, this is likely to be closed again with the next cleanup pass. |
Summary
Adds a YAML-driven error handling pipeline that matches raw Azure errors against well-known patterns and wraps them with user-friendly messages, actionable suggestions, and reference links. The goal: anyone can improve the error experience by editing a single YAML file.
For errors that need runtime context (like querying Azure for available regions), the pipeline supports named handlers registered in the IoC container that compute suggestions dynamically while still pulling static data (links, etc.) from the YAML rule that matched.
This PR also removes
soft_delete_hint.go(from #6810) since those scenarios are now expressed declaratively as YAML rules.How It Works
Error middleware flow:
YAML Rule Format
Rules live in
resources/error_suggestions.yaml(embedded at build time). Each rule can use text patterns, typed error matching, or both.Matching logic:
patterns- OR (any pattern matches); case-insensitive substring by default, regex whenregex: trueproperties- AND (all must match); resolved via reflection on the matched error typeResponse fields:
message- user-friendly explanationsuggestion- actionable next stepslinks- list of{url, title?}rendered as hyperlinked bulletshandler- name of a registered ErrorHandler for dynamic suggestions (receives the matching rule)Custom Handlers
When a rule specifies
handler, the pipeline resolves the named ErrorHandler from the IoC container and invokes it with the error and the matching rule. The handler can use rule data (e.g., links) or ignore it.Built-in: ResourceNotAvailableHandler handles LocationNotAvailableForResourceType:
ARM Error Refactoring
DeploymentErrorLine now implements the Go error interface and Unwrap() []error, enabling:
Covered Scenarios
Files Changed
Related PRs
Subsumes error handling from: