Skip to content

perf: Optimized out shared_ptr copies#3079

Open
jmachowinski wants to merge 1 commit intoros2:rollingfrom
cellumation:shared_ptr_optimization
Open

perf: Optimized out shared_ptr copies#3079
jmachowinski wants to merge 1 commit intoros2:rollingfrom
cellumation:shared_ptr_optimization

Conversation

@jmachowinski
Copy link
Collaborator

Description

This is a PR that aims to eliminate overhead by unneeded copies of shared ptrs all over the code.
ATM this is a draft as it might touch external interfaces and beak user code down the line.

Fixes # (issue)

Is this user-facing behavior change?

Yes, as in theory it is an API change, but in most cases the end user will not notice it.

Did you use Generative AI?

I wish I had...

@jmachowinski
Copy link
Collaborator Author

Pulls: #3079
Gist: https://gist.githubusercontent.com/jmachowinski/e58a334761918338ebbc077250242629/raw/62bd212223c6dd843d2dd883af7a299a2a5cdc33/ros2.repos
BUILD args:
TEST args:
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/18279

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@jmachowinski jmachowinski marked this pull request as ready for review February 24, 2026 16:06
Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general a big 👍 from me.

I did have a question about one class of change, but I think it probably has a reason.

The whole "not using const ref to shared ptr" is due to Dirk not wanting to use them due to some class of dangling ownership bug he ran into once, but I always thought it was fine and then later the cpp core guidelines clarified it and we started doing it the "right" way in new code, but we never cleaned up the existing interfaces so thanks for doing that.

can_be_taken_from_(true),
automatically_add_to_executor_with_node_(automatically_add_to_executor_with_node),
context_(context)
context_(std::move(context))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this one? Seems like it's not needed as context is a temporary but could be confusing if it is referenced in the constructor body instead of context_.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two way of approaching the pass by value & pass by ref issue. And to make matters worse, the 'better' approach is dependent on if its a header only method or in a separate compile unit.

It is faster to pass an object as an value and and assign it to a member using std::move, if this happens in a header / same compile unit. If you play around in compiler explorer you can see that the generated code for this version is shorter, Microbenchmarks also show this.

If you already have a value as it is dictated by the interface etc, using std::move during assignment to a member variable is preferred and faster.

If you have a call chain, of objects passing on an argument, pass by reference always wins.

If you wanted to do this really clean etc you would need to generate methods and constructors for move semantics, but this is a looooot of work...

Comment on lines -63 to +64
: pre_callback(pre_callback),
post_callback(post_callback),
: pre_callback(std::move(pre_callback)),
post_callback(std::move(post_callback)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question here. Is there a c++ core guideline that covers this case or something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this special case

JumpHandler::JumpHandler(
  pre_callback_t &&pre_callback,
  post_callback_t &&post_callback,
  const rcl_jump_threshold_t & threshold)

would be the clean solution. See F.18 of the core guideline for reference.

See also https://clang.llvm.org/extra/clang-tidy/checks/modernize/pass-by-value.html
Or https://www.youtube.com/watch?v=56DMwqKffi0


if (enable_param.get<bool>()) {
auto * rcl_node = node_base->get_rcl_node_handle();
auto * rcl_node = node_base_->get_rcl_node_handle();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last comment about this class of change, but this is a good example of a change that is required due to the new moves in the constructor initializer statements which I don't think is necessarily a good (or bad) thing. It just seems unnecessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is clearly faster, as the compiler can see through it and perform optimizations. I must say, that in general I am not a big fan of move semantics, as compilers (at least on ubuntu 22.04) are not warning you if you use an object that was moved away and therefore it is easy to introduce errors.

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the approach looks good to avoid extra copies. DCO is missing though.

@jmachowinski jmachowinski force-pushed the shared_ptr_optimization branch 2 times, most recently from eda27e5 to bf274bf Compare February 27, 2026 15:37
@jmachowinski
Copy link
Collaborator Author

Pulls: #3079
Gist: https://gist.githubusercontent.com/jmachowinski/63e31b10285ccef61bd5b808c1ec4ea3/raw/62bd212223c6dd843d2dd883af7a299a2a5cdc33/ros2.repos
BUILD args:
TEST args:
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/18334

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@jmachowinski jmachowinski force-pushed the shared_ptr_optimization branch from bf274bf to 571f6bc Compare February 27, 2026 17:41
@jmachowinski
Copy link
Collaborator Author

Pulls: #3079
Gist: https://gist.githubusercontent.com/jmachowinski/a1ea80fd50703642d315bd27ba95654e/raw/62bd212223c6dd843d2dd883af7a299a2a5cdc33/ros2.repos
BUILD args:
TEST args:
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/18336

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>
@jmachowinski jmachowinski force-pushed the shared_ptr_optimization branch from 571f6bc to 9eecbd7 Compare February 27, 2026 18:17
@jmachowinski
Copy link
Collaborator Author

Pulls: #3079
Gist: https://gist.githubusercontent.com/jmachowinski/7d833c3ac44a3c145ff07fe16a45e2cc/raw/62bd212223c6dd843d2dd883af7a299a2a5cdc33/ros2.repos
BUILD args:
TEST args:
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/18337

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants