This fine-tuned LLM is intended for the task of geocoding complex location references, and accompanies Coordinates from Context: Using LLMs to Ground Complex Location References (Masis & O'Connor, EACL 2026). The model is referred to as "Geoparser-augmented FT Qwen 14B" in the paper.
Model description
The base model is a quantized Qwen3-14B model (unsloth/Qwen3-14B-unsloth-bnb-4bit), which has been fine-tuned for geocoding, i.e. linking a location reference to an actual geographic location.
The model was trained using parameter-efficient fine-tuning via low-rank adaptation.
It was trained for our 'Geoparser-augmented' approach, where a separate geoparsing tool augments the inputs with the center coordinates of mentioned locations;
our fine-tuned model then uses both the original location reference and the mentioned locations' coordinates to generate the described location's bounding box.
For more details, please see the accompanying paper.
Training data
The model is trained on 13k examples from the training subset of the GeoCoDe dataset, where the input is a complex location reference and the center coordinates of each mentioned location and the output is the location's corresponding bounding box.
Limitations
Due to data limitations, this model has been trained and evaluated for our task only in Mainstream American English.
Usage (unsloth)
The following code snippet illustrates how to use the model. For the system prompt we used and for example prompts, please see the appendices in the accompanying paper.
from unsloth import FastLanguageModel
import torch
model_name = "tmasis/geocoding-complex-location-references"
# Load model and tokenizer from Huggingface Hub
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = model_name,
max_seq_length = 2048,
load_in_4bit = True,
)
FastLanguageModel.for_inference(model)
# Prepare model input
messages = [{"role": "system", "content": <system_prompt>},
{"role": "user", "content": <prompt>}]
text = tokenizer.apply_chat_template(messages,
tokenize=False,
add_generation_prompt = True,
enable_thinking = False
)
# Conduct text generation
outputs = model.generate(**tokenizer(text, return_tensors="pt").to(model.device),
max_new_tokens=1024, temperature=0.7, top_p=0.8, top_k=20)
response = tokenizer.batch_decode(outputs)[0]
print(response)
Usage (HuggingFace transformers)
Alternatively, you can use the HuggingFace transformers library.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "tmasis/geocoding-complex-location-references"
# Load model and tokenizer from Huggingface Hub
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name = model_name,
torch_dtype = "auto",
device_map = "auto"
)
# Prepare model input
messages = [{"role": "system", "content": <system_prompt>},
{"role": "user", "content": <prompt>}]
text = tokenizer.apply_chat_template(messages,
tokenize=False,
add_generation_prompt = True,
enable_thinking = False
)
# Conduct text generation
outputs = model.generate(**tokenizer(text, return_tensors="pt").to(model.device),
max_new_tokens=1024, temperature=0.7, top_p=0.8, top_k=20)
response = tokenizer.batch_decode(outputs)[0]
print(response)
Model tree for tmasis/geocoding-complex-location-references
Base model
Qwen/Qwen3-14B-Base