Fine-Tuning Generative LLMs for Geographic Analysis: Comparing GPT-4o, Gemma-2B, and DeepSeek-R1
Tóm tắt
The integration of Large Language Models (LLMs) into geospatial analysis presents significant opportunities for advancing spatial intelligence. However, questions remain regarding the optimal balance between predictive performance and computational efficiency. This study provides a comprehensive investigation into modern LLMs for geospatial reasoning tasks. We compare the state-of-the-art proprietary model GPT-4o against two fine-tuned lightweight open-weight models: Gemma-2B and DeepSeek-R1. To ensure robust evaluation, we conduct experiments on the complete 100,000-prompt GeoLLM benchmark, using the original fine-tuned LLaMA-7B as our primary baseline. Our experimental results demonstrate that GPT-4o achieves a new performance record with an average Pearson's r2 of 0.862. More significantly, the fine-tuned Gemma-2B model surpasses the larger LLaMA-7B baseline while …