Skip to content

[Common] Fix linker error for to_string(DType) in distributed tests#2757

Open
vcherepanov-nv wants to merge 2 commits intoNVIDIA:mainfrom
vcherepanov-nv:fix-to-string-linker-error
Open

[Common] Fix linker error for to_string(DType) in distributed tests#2757
vcherepanov-nv wants to merge 2 commits intoNVIDIA:mainfrom
vcherepanov-nv:fix-to-string-linker-error

Conversation

@vcherepanov-nv
Copy link
Collaborator

Make transformer_engine::to_string(DType) inline in common.h so that translation units outside libtransformer_engine.so can resolve it without requiring the symbol to be exported.

Regression introduced by 61f9594 which added to_string(DType) calls into TRANSFORMER_ENGINE_TYPE_SWITCH_* macros, causing test object files to reference the symbol that the linker version script hides.

Description

Please include a brief summary of the changes, relevant motivation and context.

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Make to_string(DType) inline

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

vcherepanov-nv and others added 2 commits March 12, 2026 23:37
Make transformer_engine::to_string(DType) inline in common.h so that
translation units outside libtransformer_engine.so can resolve it
without requiring the symbol to be exported.

Regression introduced by 61f9594 which added to_string(DType) calls
into TRANSFORMER_ENGINE_TYPE_SWITCH_* macros, causing test object files
to reference the symbol that the linker version script hides.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR fixes a linker error in distributed tests by making transformer_engine::to_string(DType) an inline function in common.h rather than a non-inline function defined in transformer_engine.cpp. The TRANSFORMER_ENGINE_TYPE_SWITCH_* macros call to_string(DType) in their error-handling paths; because those macros are expanded in test TUs that link against libtransformer_engine.so, and because the linker version script hides everything not explicitly listed in global:, the test executables could not resolve the symbol. Moving the definition to the header and marking it inline eliminates the external reference entirely.

Changes:

  • common.h: The to_string(DType) forward declaration is replaced with an inline definition containing the full switch statement. All 11 DType enum values are covered, with a safe default fallback using std::string("Invalid type ") + std::to_string(static_cast<int>(type)).
  • transformer_engine.cpp: The now-redundant to_string(DType) implementation is removed.
  • The equivalent to_string(NVTEScalingMode) still lives as a non-inline, non-exported function in transformer_engine.cpp. It is called from the inline Tensor::shape() method defined in common.h, which could trigger the same class of linker error for tests that exercise Tensor::shape().

Minor note: The PR description marks the type of change as "New feature"; this is a bug fix.

Confidence Score: 4/5

  • Safe to merge — the fix is correct and minimal; the only concern is a potentially analogous but pre-existing issue with to_string(NVTEScalingMode).
  • The change is a straightforward and well-understood C++ technique (moving a function to a header as inline to avoid hidden-symbol linker errors). The new implementation is functionally identical to the removed one. The minor deduction is for the related to_string(NVTEScalingMode) symbol, which is also non-exported and called from an inline header method, and could surface as a similar linker failure.
  • transformer_engine/common/common.h — the non-inline to_string(NVTEScalingMode) declaration at line 72 may be susceptible to the same linker issue for tests that exercise Tensor::shape().

Important Files Changed

Filename Overview
transformer_engine/common/common.h Moves to_string(DType) from transformer_engine.cpp to this header as an inline function, fixing the linker error for translation units outside libtransformer_engine.so. The implementation is complete and correct — all 11 DType enum values (kByte through kFloat4E2M1) are covered with a safe default fallback.
transformer_engine/common/transformer_engine.cpp Removes the now-inlined to_string(DType) implementation. The remaining to_string(NVTEScalingMode) definition is untouched and still lives here as a non-inline function.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Test TU includes common.h"] --> B["TRANSFORMER_ENGINE_TYPE_SWITCH_*\nmacro expanded in test"]
    B --> C{"Unsupported dtype\nerror path hit?"}
    C -- "Yes (before fix)" --> D["Reference to to_string(DType)\nnon-inline, hidden by .version script"]
    D --> E["❌ Linker error:\nsymbol not found in .so"]
    C -- "Yes (after fix)" --> F["to_string(DType) is inline\ncode generated in test TU directly"]
    F --> G["✅ Links successfully"]
    C -- "No" --> G

    H["Tensor::shape() inline method\nin common.h line ~283"] --> I{"default: branch\nreached?"}
    I -- "Yes" --> J["Reference to to_string(NVTEScalingMode)\nnon-inline, hidden by .version script"]
    J --> K["⚠️ Potential linker error\nnot addressed by this PR"]
Loading

Last reviewed commit: b5055bb

return std::string("Invalid type ") + std::to_string(static_cast<int>(type));
}
}
std::string to_string(const NVTEScalingMode &mode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential same linker issue for to_string(NVTEScalingMode)

to_string(const NVTEScalingMode &mode) is also a non-inline, non-exported function (it is not listed in libtransformer_engine.version's global: section). It is called from the inline Tensor::shape() method defined in this very header (line ~283):

default:
  NVTE_ERROR("Cannot parse tensor shape with scaling mode \"", to_string(scaling_mode), "\"");

Any translation unit that includes common.h (e.g., test object files) and references Tensor::shape() will generate a dynamic reference to to_string(NVTEScalingMode). Because this symbol is hidden by the linker version script's local: * catch-all, linking such a TU against libtransformer_engine.so will fail the same way the to_string(DType) issue did.

Consider making to_string(const NVTEScalingMode &mode) inline here as well, similarly to the fix applied to to_string(DType) in this PR.

@vcherepanov-nv vcherepanov-nv requested a review from ptrendx March 13, 2026 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant