FastDDS 特定情况下 subscriber 无法重连

刚开始用 FastDDS 的时候就发现这个问题,原以为是共享内存 deadlock 了,于是就有了前面关于 boost 鲁棒锁的那篇帖子。但问题并没有这么简单 >_<
这个问题在 Github/eProsima/Fast-DDS#2811 有详细记录。

对 FastDDS 程序进行 debug 时,将 subscriber 暂停,等一会再重启,subscriber将无法再次接收数据。Publisher / Subscriber 的日志如下

1
2
3
4
5
6
7
8
Publisher
2022-09-06 14:10:47.049 [RTPS_TRANSPORT_SHM Warning] SHM Port 7413 failure: the port is marked as not ok! -> Function try_push
2022-09-06 14:10:47.049 [RTPS_TRANSPORT_SHM Warning] (ID:140091162883648) Existing Port 7413 (5f4eeb5613a33705) NOT Healthy. -> Function open_port_internal
2022-09-06 14:10:47.049 [RTPS_TRANSPORT_SHM Warning] (ID:140091162883648) Port 7413 (5f4eeb5613a33705) Removed. -> Function open_port_internal
Message: HelloWorld with index: 157 SENT
2022-09-06 14:10:47.246 [RTPS_TRANSPORT_SHM Warning] SHM Port 7412 failure: the port is marked as not ok! -> Function try_push
2022-09-06 14:10:47.248 [RTPS_TRANSPORT_SHM Warning] (ID:140091402036800) Existing Port 7412 (3e1fa9f9eb2b0ade) NOT Healthy. -> Function open_port_internal
2022-09-06 14:10:47.248 [RTPS_TRANSPORT_SHM Warning] (ID:140091402036800) Port 7412 (3e1fa9f9eb2b0ade) Removed. -> Function open_port_internal
1
2
3
Subscriber
2022-09-07 14:50:27.441 [RTPS_TRANSPORT_SHM Warning] (ID:140737325877056) Port 7412 Zombie. Reset the port -> Function open_port_internal
2022-09-07 14:50:27.442 [RTPS_TRANSPORT_SHM Warning] (ID:140737325877056) Port 7413 Zombie. Reset the port -> Function open_port_internal

查看 log 对应的代码,最后定位到可能出现问题的 commit 在 eProsima/Fast-DDS@e58dcb1

这个 commit 是 Pull Request 时将几十个 commit squash 在一起的[链接]。squash 前的 commit 在 github 可以查看,但用 git 命令是无法 checkout 的,所以手动下载了一堆源码编译,二分查找了一下,最后定位在下面这个 commit:

Refs #8250. Do not reuse zombie ports structures.
@adolfomarver authored and @MiguelCompany committed on May 12, 2020

问题看起来像是 Subscriber 拒绝复用 zombie ports,但又没申请新的 port.
注释了几行代码,可以暂时避免这个问题,参考 duchengyao@Fast-DDS/commit/63ad668