Abstract:
In recent years, the development of artificial intelligence (AI) technology has driven innovation in mineral analysis methods. Traditional mineral analysis relies heavily on specialized geological knowledge and numerous sophisticated instruments. Multimodal Large Language Models (MLLMs), with their advantages in multi-source information fusion and complex scene understanding, offer a new research path in this field. However, these general-purpose MLLMs have limited performance in specialized fields such as mining. To address this, this study developed a mineral analysis system based on MLLMs. Utilizing a mineral "image-text" dataset and using Qwen2.5-VL as the multimodal base model, the system compared and evaluated two efficient parameter fine-tuning strategies, IA3 and LoRA, and selected the optimal one. The system also incorporated retrieval-augmented generation (RAG) technology, integrated domain knowledge bases, and constructed a lightweight web-based front-end and back-end architecture for interactive visualization. Experimental results demonstrate that the system effectively improves the accuracy of mineral analysis, providing a viable solution for the intelligent development of mineral analysis.