NASA

EarthData

Similarity Search

BETA

About Similarity Search

How Similarity Search Works

What is Similarity Search?

Similarity Search is a tool that utilizes a machine learning technique called self-supervised learning to locate similar images/data from image archives. It is intended to augment and make the process of data search and collection for scientific studies more efficient.

One of the initial and most crucial steps for a scientific study related to climate change or events (e.g., wildfires, oil spills, hurricanes, dust storms) is for scientists to gather many relevant examples. Locating these examples requires painstakingly inspecting millions of square miles of satellite imagery each day across several years. While such an effort can produce a valuable trove of data, the act of manually searching is laborious, expensive, and often impractical–grounding many scientific studies before they could ever take off.

The similarity search augments the search process by allowing users to input a sample image and return similar looking images from the entire archive, minimizing the data download and visual comparison for similarity. This process also allows users to explore the spatio-temporal patterns of the features within the sample image.

Explore View

Based on the image section the user selects on the base map and the number of results they seek, they receive similar image(s) from across the entire multiyear archive.

Results View

The tool returns a set of images similar to the input image based on the similarity threshold scale selected by the user. The smaller the range selected on the threshold scale, the more similar the image results are to the input image.

The histogram at the bottom of the page provides an overview of the distribution of the results across time. The heat map over the world map on the left side of the screen shows the coverage across space.

About the Data

Similarity Search currently accesses Moderate Resolution Imaging Spectroradiometer (MODIS) data through NASA’s Global Imagery Browse Services (GIBS) . The entire archive of MODIS Terra Corrected Reflected True Color, including data from 2000-2022 have been ingested, indexed, and made available at native resolution to Similarity Search.

Methodology

ML Models

The Similarity Search tool utilizes the self-supervised learning algorithms SimSiam and SimCLR to create embeddings representing the data. It utilizes approximate nearest neighbors methods deployed on the cloud to search at scale.

Disclaimer About the Model

  1. The SSL model is intended to be used only as a data discovery tool that makes it easier for scientists to comb through millions of images. Any downstream task will need a manual validation for correctness of results.
  2. The similarity scores are relative based on the embeddings and cannot be directly interpreted.

Current Limitations

  • This platform currently only supports MODIS datasets
  • The user can access the data but can only download the images manually at present.
  • Users can only choose a single input image from the map on the explore page.
  • Users cannot use the images from the Results view to filter.
  • The histogram is a static representation at the moment and cannot be used interactively.

Team

Moving forward together

This project is the result of close collaboration between organizations working to develop a deeper understanding of Earth science.

  • NASA logo
    NASA Impact logo
  • University of Alabama at Hunstville logo
  • Development Seed logo

NASA

Earthdata

2023

NASA OFFICIAL

Manil Maskey