tensorrt inference of onnx models with custom layers in python

该博客介绍了如何利用TensorRT对用于自动驾驶的PackNet深度估计网络进行推理优化。首先,从PyTorch模型转换为ONNX格式,然后通过ONNX GraphSurgeon API修改图,包括折叠Pad节点、处理Upsample和GroupNormalization层。通过自定义插件注册,实现了在TensorRT中的高效执行。最后,提供了处理ONNX模型的代码示例。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本例使用tensorrt实现packNet网络的推理。packNet网络是一种用于自动驾驶的自监督的单目深度估计网络。
本例中将一个pytorch graph转换为onnx,使用tensorrt的onnx-parser解析onnx graph。

  • 在onnx graph中使用custom layers(plugin)。通过使用tensorrt的REGISTER_TENSORRT_PLUGIN 自动注册这些插件
  • 使用Onnx-graphsurgeon api更改onnx graph中的层和subgraphs。在本例中,将Group Normalization,upsample 和pad layers,并使用tensorrt进行推理

1 run

  1. 复制packnet资源
git clone https://github.com/TRI-ML/packnet-sfm.git packnet-sfm
pushd packnet-sfm && git checkout tags/v0.1.2 && popd
export PYTHONPATH=$PWD/packnet-sfm
  1. pytorch转onnx,这一步也包含配置Grop Normalization ,使用ONNX_GS修改upsample 和pad layers
python3 convert_to_onnx.py --output model.onnx
  1. tensorrt推理
trtexec --onnx=model.onnx --explicitBatch

2 code

pip install -r requirments.txt

post_processing.py

#!/usr/bin/env python3
import onnx_graphsurgeon as gs
import argparse
import onnx
import numpy as np
def process_pad_nodes(graph):
    upsample_nodes = [node for node in graph.nodes if node.op == "Pad"]
    for node in upsample_nodes:
        fold_pad_inputs(node, graph)
    return graph
def fold_pad_inputs(node, graph):
    pad_values_pyt = node.i(1).i(0).i(0).i(0).i(0).i(0).i(0).i(0).attrs['value'].values
    onnx_pad_values = [0]*4*2
    j=3
    for i in range(0, len(pad_values_pyt), 2):
        onnx_pad_values[j] = pad_values_pyt[i]
        onnx_pad_values[j+4] = pad_values_pyt[i+1]
        j-=1
    pads_folded_tensor = gs.Constant(name=node.inputs[1].name, values=np.array(onnx_pad_values))
    node.inputs[1] = pads_folded_tensor
def process_upsample_nodes(graph, opset=11):
    if opset == 11:
        upsample_layer_name = "Resize"
    else:
        upsample_layer_name = "Upsample"
    upsample_nodes = [node for node in graph.nodes if node.op == upsample_layer_name]
    for node in upsample_nodes:
        fold_upsample_inputs(node, graph, opset)
    return graph
def fold_upsample_inputs(upsample, graph, opset=11):
    if opset==9:
        scale_factor = upsample.i(1).i(1).i(0).i(0).i(0).i(0).i(0).i(0).i(1).attrs['value'].values
        scales = np.array([1.0, 1.0, scale_factor, scale_factor], dtype=np.float32)
        scale_tensor = gs.Constant(name=upsample.inputs[-1].name, values=scales)
        upsample.inputs[-1] = scale_tensor
    else:
        size_tensor_name = upsample.inputs[3].name
        scale_factor = upsample.i(3).i(1).i().i().i().i().i(0).i(1).attrs['value'].values
        scales = np.array([1.0,1.0,scale_factor,scale_factor],dtype=np.float32)
        scale_tensor = gs.Constant(name=size_tensor_name, values=scales)
        input_tensor = upsample.inputs[0]
        upsample.inputs = [input_tensor, scale_tensor]
        upsample.op = 'Upsample'
def process_groupnorm_nodes(graph):
    instancenorms = [node for node in graph.nodes if node.op == "InstanceNormalization"]
    for node in instancenorms:
        convert_to_groupnorm(node, graph)
    return graph
def retrieve_attrs(instancenorm):
    attrs = {}
    attrs["num_groups"] = instancenorm.i().i(1).attrs["value"].values[1]
    attrs["eps"] = instancenorm.attrs["epsilon"]
    attrs["plugin_version"] = "1"
    attrs["plugin_namespace"] = ""
    return attrs
def convert_to_groupnorm(instancenorm, graph):
    attrs = retrieve_attrs(instancenorm)
    groupnorm = gs.Node(op="GroupNormalizationPlugin", attrs=attrs)
    graph.nodes.append(groupnorm)

    conv_output_tensor = instancenorm.i().inputs[0]
    relu_input_tensor = instancenorm.o().o().o().outputs[0]

    conv_output_tensor.outputs[0] = groupnorm
    relu_input_tensor.inputs[0] = groupnorm
    groupnorm.inputs.append(instancenorm.o().o().i(1).inputs[0])
    groupnorm.inputs.append(instancenorm.o().o().o().i(1).inputs[0])

convert_to_onnx.py

#!/usr/bin/env python3
import torch
import onnx
import numpy as np
import argparse
import onnx_graphsurgeon as gs
from post_processing import *
from packnet_sfm.networks.depth.PackNet01 import PackNet01
def post_process_packnet(model_file, opset=11):
    graph = gs.import_onnx(onnx.load(model_file))
    if opset==11:
        graph=process_pad_nodes(graph)
    graph = process_upsample_nodes(graph, opset)
    graph = process_groupnorm_nodes(graph)
    graph.cleanup().toposort()
    onnx.save_model(gs.export_onnx(graph), model_file)
    print("Saving the ONNX model to {}".format(model_file))
def build_packnet(model_file, args):
    input_pyt = torch.randn((1,3,192,640), requires_grad=False)
    model_pyt = PackNet01(version='1A')
    torch.onnx.export(model_pyt, input_pyt, model_file, verbose=args.verbose, opset_version=args.opset)
def main():
    parser = argparse.ArgumentParser(
        description="Exports PackNet01 to ONNX, and post-processes it to insert TensorRT plugins")
    parser.add_argument("-o", "--output", help="Path to save the generated ONNX model", default="model.onnx")
    parser.add_argument("-op", "--opset", type=int, help="ONNX opset to use", default=11)
    parser.add_argument("-v", "--verbose", action='store_true',
                        help="Flag to enable verbose logging for torch.onnx.export")
    args = parser.parse_args()
    build_packnet(args.output, args)
    post_process_packnet(args.output, args.opset)
if __name__ == '__main__':
    main()

### YOLOv5 ONNX Model Resources and Information #### Overview of YOLOv5 with ONNX YOLO (You Only Look Once) is a popular real-time object detection algorithm, and version 5 introduces several improvements over previous versions. Converting the YOLOv5 model to the Open Neural Network Exchange (ONNX) format allows for deployment across various platforms using different runtimes like ONNX Runtime[^1]. This conversion facilitates interoperability between deep learning frameworks. #### Deployment Using ONNX Runtime For deploying YOLOv5 models converted into ONNX format, one can utilize ONNX Runtime which supports both C++ and Python APIs. The process involves loading an ONNX file that contains the trained weights and architecture details of the YOLOv5 network. Afterward, inputs are prepared according to the expected input shape required by the model before running inference sessions within applications written in either language mentioned above. However, challenges may arise when converting specific operations from PyTorch or TensorFlow to ONNX due to unsupported operators during this transformation phase as noted elsewhere[^2]. Such issues require custom solutions such as modifying source code or applying workarounds provided by community members who have encountered similar problems while working on projects involving these technologies. ```cpp // Example C++ Code Snippet for Loading ONNX Model via ONNX Runtime API #include "onnxruntime_cxx_api.h" Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "YoloV5"); Ort::SessionOptions session_options; session_options.SetIntraOpNumThreads(1); auto memory_info = Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault); std::vector<std::string> paths{"model.onnx"}; Ort::Session session(env, paths.data(), session_options); ``` ```python # Example Python Code Snippet for Running Inference With ONNX Model Through ONNX Runtime Library import numpy as np import onnxruntime as ort sess = ort.InferenceSession('model.onnx') input_name = sess.get_inputs()[0].name output_names = [o.name for o in sess.get_outputs()] result = sess.run(output_names, {input_name: image_data}) ``` --related questions-- 1. What modifications might be necessary when encountering unsupported slice axes errors during ONNX-to-NCNN conversions? 2. How does optimizing a YOLOv5 model affect its performance after being exported to ONNX format? 3. Can you provide examples where custom layers were added successfully to extend functionality beyond standard implementations available out-of-the-box with pre-trained networks like those based upon YOLO architectures? 4. Are there any best practices recommended regarding preparing datasets specifically tailored towards training efficient detectors utilizing transfer learning techniques alongside fine-tuning pretrained weights obtained through public repositories hosting state-of-the-art computer vision algorithms including but not limited to variations underpinning modern instance segmentation approaches derived originally from Faster R-CNN family trees evolving eventually leading up until today's most advanced incarnations represented currently within MMDetection suite among others? 5. Is it possible to integrate multiple object recognition tasks simultaneously leveraging multi-task learning paradigms inside single unified pipelines built around shared backbone structures common amongst many contemporary CNN-based systems designed primarily targeting visual understanding scenarios ranging widely across diverse application domains spanning everything from autonomous driving vehicles all way down personal assistant devices equipped sophisticated camera modules capable performing complex scene parsing operations autonomously without requiring constant internet connectivity thus enabling offline processing capabilities essential privacy-sensitive environments?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值