[Common] Fix linker error for to_string(DType) in distributed tests#2757
[Common] Fix linker error for to_string(DType) in distributed tests#2757vcherepanov-nv wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
Make transformer_engine::to_string(DType) inline in common.h so that translation units outside libtransformer_engine.so can resolve it without requiring the symbol to be exported. Regression introduced by 61f9594 which added to_string(DType) calls into TRANSFORMER_ENGINE_TYPE_SWITCH_* macros, causing test object files to reference the symbol that the linker version script hides. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Greptile SummaryThis PR fixes a linker error in distributed tests by making Changes:
Minor note: The PR description marks the type of change as "New feature"; this is a bug fix. Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["Test TU includes common.h"] --> B["TRANSFORMER_ENGINE_TYPE_SWITCH_*\nmacro expanded in test"]
B --> C{"Unsupported dtype\nerror path hit?"}
C -- "Yes (before fix)" --> D["Reference to to_string(DType)\nnon-inline, hidden by .version script"]
D --> E["❌ Linker error:\nsymbol not found in .so"]
C -- "Yes (after fix)" --> F["to_string(DType) is inline\ncode generated in test TU directly"]
F --> G["✅ Links successfully"]
C -- "No" --> G
H["Tensor::shape() inline method\nin common.h line ~283"] --> I{"default: branch\nreached?"}
I -- "Yes" --> J["Reference to to_string(NVTEScalingMode)\nnon-inline, hidden by .version script"]
J --> K["⚠️ Potential linker error\nnot addressed by this PR"]
Last reviewed commit: b5055bb |
| return std::string("Invalid type ") + std::to_string(static_cast<int>(type)); | ||
| } | ||
| } | ||
| std::string to_string(const NVTEScalingMode &mode); |
There was a problem hiding this comment.
Potential same linker issue for to_string(NVTEScalingMode)
to_string(const NVTEScalingMode &mode) is also a non-inline, non-exported function (it is not listed in libtransformer_engine.version's global: section). It is called from the inline Tensor::shape() method defined in this very header (line ~283):
default:
NVTE_ERROR("Cannot parse tensor shape with scaling mode \"", to_string(scaling_mode), "\"");Any translation unit that includes common.h (e.g., test object files) and references Tensor::shape() will generate a dynamic reference to to_string(NVTEScalingMode). Because this symbol is hidden by the linker version script's local: * catch-all, linking such a TU against libtransformer_engine.so will fail the same way the to_string(DType) issue did.
Consider making to_string(const NVTEScalingMode &mode) inline here as well, similarly to the fix applied to to_string(DType) in this PR.
Make transformer_engine::to_string(DType) inline in common.h so that translation units outside libtransformer_engine.so can resolve it without requiring the symbol to be exported.
Regression introduced by 61f9594 which added to_string(DType) calls into TRANSFORMER_ENGINE_TYPE_SWITCH_* macros, causing test object files to reference the symbol that the linker version script hides.
Description
Please include a brief summary of the changes, relevant motivation and context.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: