SD3403/SS928 NPU算子ResizeBilinearV2执行结果异常_问答

问题描述：
我在使用pytorch架构的bisenet模型，转换为ONNX后再转换成om模型运行结果异常，随后定位到问题出现在模型最后的上采样算子ResizeBilinearV2。
在pytorch中使用F.interpolate或nn.Upsample，使用ATC转换后都会变成CANN的ResizeBilinearV2算子
F.interpolate(x, scale_factor=2, mode='bilinear', align_corners=False)
nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False)
为了测试，使用了一个只有F.interpolate的onnx模型：

```python
class ResizeModel(torch.nn.Module):
    def __init__(self, scale_factor=2.0):
        super(ResizeModel, self).__init__()
        self.scale_factor = scale_factor
    def forward(self, x):
        # 仅执行 Resize 操作
        return F.interpolate(x, scale_factor=self.scale_factor, mode='bilinear', align_corners=False)
```

onnx模型如图

使用MindStudio进行ATC模型转换：

aipp_op { 
related_input_rank : 0
src_image_size_w : 320
src_image_size_h : 160
crop : false
resize : false
padding : false
input_format : RGB888_U8
aipp_mode: static
csc_switch : true
rbuv_swap_switch : true
matrix_r0c0 : 76
matrix_r0c1 : 150
matrix_r0c2 : 30
matrix_r1c0 : 0
matrix_r1c1 : 0
matrix_r1c2 : 0
matrix_r2c0 : 0
matrix_r2c1 : 0
matrix_r2c2 : 0
input_bias_0 : 0
input_bias_1 : 0
input_bias_2 : 0

mean_chn_0 : 0
min_chn_0 : 0.0
var_reci_chn_0 : 1.0
}


2024-12-06 09:35:10  Start to convert model
2024-12-06 09:35:10  export PATH=$PATH:/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/atc/ccec_compiler/bin:/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/atc/bin && export PYTHONPATH=$PYTHONPATH:/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/atc/python/site-packages:/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/atc/python/site-packages/auto_tune.egg/auto_tune:/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/atc/python/site-packages/schedule_search.egg && export LD_LIBRARY_PATH=/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/atc/lib64:/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/toolkit/lib64:/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/add-ons:$LD_LIBRARY_PATH:/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/atc/lib64/stub && export SLOG_PRINT_TO_STDOUT=1 && export ASCEND_OPP_PATH=/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/opp && /home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/atc/bin/atc  --input_shape="input_image:1,1,160,320" --check_report=/home/hzh/modelzoo/resize_model_2_2factor/OPTG/network_analysis.report --input_format=NCHW --output="/home/hzh/modelzoo/resize_model_2_2factor/OPTG/resize_model_2_2factor" --soc_version=OPTG --insert_op_conf=/home/hzh/modelzoo/resize_model_2_2factor/OPTG/insert_op.cfg --framework=5 --model="/home/hzh/ss928/bisenet2/tools/resize_model_2_2factor.onnx" --output_type=UINT8
2024-12-06 09:35:10  ATC start working now, please wait for a moment.
2024-12-06 09:35:38  ATC run success, welcome to the next use.
2024-12-06 09:35:38  W11001: Op [trans_Cast_2] does not hit the high-priority operator information library, which might result in compromised performance.
2024-12-06 09:35:38  Convert model environment variables: 
2024-12-06 09:35:38  export PATH=$PATH:/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/atc/ccec_compiler/bin:/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/atc/bin && export PYTHONPATH=$PYTHONPATH:/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/atc/python/site-packages:/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/atc/python/site-packages/auto_tune.egg/auto_tune:/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/atc/python/site-packages/schedule_search.egg && export LD_LIBRARY_PATH=/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/atc/lib64:/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/toolkit/lib64:/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/add-ons:$LD_LIBRARY_PATH:/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/atc/lib64/stub && export SLOG_PRINT_TO_STDOUT=1 && export ASCEND_OPP_PATH=/home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/opp
2024-12-06 09:35:38  Convert model command: 
2024-12-06 09:35:38  /home/hzh/Ascend_nnn/ascend-toolkit/5.13.t5.0.b050/atc/bin/atc  --input_shape="input_image:1,1,160,320" --check_report=/home/hzh/modelzoo/resize_model_2_2factor/OPTG/network_analysis.report --input_format=NCHW --output="/home/hzh/modelzoo/resize_model_2_2factor/OPTG/resize_model_2_2factor" --soc_version=OPTG --insert_op_conf=/home/hzh/modelzoo/resize_model_2_2factor/OPTG/insert_op.cfg --framework=5 --model="/home/hzh/ss928/bisenet2/tools/resize_model_2_2factor.onnx" --output_type=UINT8
2024-12-06 09:35:38  Model converted successfully.
2024-12-06 09:35:38  Model input path:/home/hzh/ss928/bisenet2/tools/resize_model_2_2factor.onnx
2024-12-06 09:35:38  Model output path:/home/hzh/modelzoo/resize_model_2_2factor/OPTG
2024-12-06 09:35:38  Aipp config file path:/home/hzh/modelzoo/resize_model_2_2factor/OPTG/insert_op.cfg
2024-12-06 09:35:38  Model conversion log file path:/home/hzh/modelzoo/resize_model_2_2factor/OPTG/ModelConvert.txt
2024-12-06 09:35:38  Model conversion config file path:/home/hzh/modelzoo/resize_model_2_2factor/OPTG/resize_model_2_2factor_config.json

转换过程没有报错，其中[W11001:]是一个FP32转换UINT8类型的警告
转换后的om模型如下：

模型输入为RGB_U8_Packed类型的二进制图像，通过AIPP转换为Gray灰度图像，然后使用ResizeBilinearV2进行插值放大，输出插值放大后的U8灰度图像
以下为测试用输入图像

使用om模型在3403板端推理结果：
scale_factor=2

scale_factor=8

这种情况是什么原因？是需要做后处理？还是ResizeBilinearV2算子不支持，但我在《CANN算子规格说明》中找到了ResizeBilinearV2这个算子说明

ATC版本：5.13.t5.0.B050
SDK版本：V2.0.2.1

~~~~
还有一个问题，当我尝试使用nearest插值代替bilinear的时候发现，如果直接使用nearest放大8倍，

nn.Upsample(scale_factor=8),
# nn.Upsample(scale_factor=8, mode='bilinear', align_corners=False)

ATC转换会报错，但可以产生om模型，且运行会报错：

2024-12-06 10:14:18  ModuleNotFoundError: No module named 'impl.resize_nearest_neighbor_v2'
2024-12-06 10:14:18  ModuleNotFoundError: No module named 'impl.resize_nearest_neighbor_v2'
2024-12-06 10:14:18  ATC run success, welcome to the next use.
2024-12-06 10:14:18  W11001: Op [PartitionedCall_ResizeNearestNeighborV2_81_31] does not hit the high-priority operator information library, which might result in compromised performance.

如果放大2倍则可以正常转换而且可以正常运行，使用的算子为ResizeNearestNeighborV2，且输入输出为NC1HWC0类型

nn.Upsample(scale_factor=2),

但如果使用连续三次nearest插值放大2倍，从而放大8倍时，转换就会报错，但可以产生om模型，且运行会报错

nn.Upsample(scale_factor=2),
nn.Upsample(scale_factor=2),
nn.Upsample(scale_factor=2)

2024-12-06 10:35:47  ModuleNotFoundError: No module named 'impl.resize_nearest_neighbor_v2'
2024-12-06 10:35:47  ModuleNotFoundError: No module named 'impl.resize_nearest_neighbor_v2'
2024-12-06 10:35:47  ATC run success, welcome to the next use.
2024-12-06 10:35:47  W11001: Op [PartitionedCall_ResizeNearestNeighborV2_106_62] does not hit the high-priority operator information library, which might result in compromised performance.

而且3次连续的resize在om模型中被分为两部分，前两次数据类型为NC1HWC0，而第三期resize前先做了数据类型转换，由NC1HWC0转换为了NCHW。而且ATC转换报错的正是使用NCHW格式的那个算子，而前两个使用NC1HWC0的则显示正常，这是什么原因？和缩放数据的尺寸有关系么？

在《CANN算子规格说明》中，ResizeNearestNeighborV2支持NCHW格式，没有提到NC1HWC0

SD3403/SS928 NPU算子ResizeBilinearV2执行结果异常

Markdown 语法

Markdown 语法