arXiv

Vision Transformers and Convolutional Neural Networks for Land Use Scene Classification

Title: Comparative Analysis of Vision Transformers and Convolutional Neural Networks in Land Use Scene Classification

Remote sensing-based Land Use Scene Classification (LUSC) is a pivotal component in managing sustainable resources, urban development, and environmental monitoring. While Convolutional Neural Networks (CNNs) have long led the field due to their proficiency in extracting local spatial features, the advent of Vision Transformers (ViTs) has shifted the paradigm. ViTs leverage self-attention mechanisms to model long-range dependencies, offering the potential for a deeper understanding of global context.

This study conducts a comparative evaluation of CNN-based architectures against Vision Transformers for LUSC tasks. Using benchmark datasets such as the UC Merced Land Use and EuroSAT Land Use collections, we assessed representative models, including AlexNet and the Vision Transformer. The analysis focused on key performance metrics: classification accuracy, precision, recall, F1-score, and computational complexity.

The experimental outcomes reveal distinct advantages for each architecture depending on the data context. CNNs demonstrated robust performance on datasets characterized by strong local textures and limited training samples. In contrast, Vision Transformers excelled at capturing global spatial relationships within complex scenes, provided that ample training data was available. However, ViTs generally demand higher computational resources and larger datasets to reach their full potential. These insights highlight the specific strengths and constraints of both approaches, offering practical guidance for selecting the most suitable model for remote sensing applications.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

Glazer Family Members Said to Study Manchester United Stake Sale
Bloomberg

Glazer Family Members Said to Study Manchester United Stake Sale

Reports indicate the Glazer family is evaluating a potential sale of their Manchester United stake, with family members ...

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines
Bloomberg

Ares' Blair Jacbobson: Disconnect Over Private Credit Headlines

Ares’ Blair Jacobson argues that private credit headlines misrepresent reality, highlighting a disconnect between media ...

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion
Bloomberg

Nvidia-Backed Robotics Startup Generalist AI Valued at $2 Billion

Nvidia-backed robotics startup Generalist AI has reached a $2 billion valuation. Founders Pete Florence, Andy Zeng, and ...

TechCrunch

Oura Ring 5 review: Thinner, lighter, better

The Oura Ring 5 is 40% smaller and lighter than its predecessor, offering superior comfort and a discreet, jewelry-like ...

Financial Times

How AI has de-skilled translation

AI fragments specialist translation into routine tasks, effectively de-skilling the profession. This shift reduces compl...

Zurich Insurance Expands Data-Center Offering Beyond the US
Bloomberg

Zurich Insurance Expands Data-Center Offering Beyond the US

Zurich Insurance Group is expanding its data center insurance products internationally, extending coverage beyond the Un...