Abstract:
The combination of object detection and mixed reality (MR) technology has shown broad application prospects in the field of mining equipment maintenance. In order to meet the requirements of MR equipment for lightweight and efficient detection models, and to address the problem of target recognition in current mixed reality assisted maintenance that cannot balance accuracy and real-time performance, we propose a lightweight recognition model YOLOv8-CLRS based on YOLOv8. First, we introduce a heterogeneous kernel-based convolution (HetConv) module to replace the native C2f structure. This redesign significantly reduces computational complexity and floating-point operations, resulting in faster inference speeds while effectively maintaining rich feature representations. Second, we integrate linearly deformable convolution modules into the backbone network, replacing standard convolutions. This enhancement improves the model’s ability to adapt to objects with diverse geometries and dynamic spatial layouts, thereby increasing robustness in cluttered and variable industrial settings. Third, we reconstruct the neck of the network using a reparameterized generalized feature pyramid network, which promotes more efficient multi-scale feature fusion and strengthens semantic interaction across different feature levels. In addition, we refine the bounding box regression process by incorporating shape-IoU, a novel loss function that emphasizes geometric shape alignment between predictions and ground truths. This results in superior localization performance, particularly for non-rectangular or intricately shaped mechanical components. The proposed model was rigorously evaluated on a custom-built dataset containing images of key components of a tunneling machine, captured under diverse conditions including varying illumination, occlusion, and viewpoints. The experimental results show a 0.4% improvement in precision, a 30% reduction in parameter count, and a 13.9% increase in inference speed compared to the baseline YOLOv8 model. These performance enhancements confirm the model’s strong suitability for real-time MR applications that require both reliability and efficiency. This study not only presents a viable and efficient solution for MR-assisted industrial maintenance systems but also offers valuable insights into lightweight model design that could benefit a wide range of edge-computing applications in industry. The proposed architecture effectively balances the trade-off between computational load and detection accuracy, addressing a major barrier to the widespread adoption of AI-enhanced MR systems in real-world maintenance environments.