Breadcrumb navigation

Intuitively Organizing a Large Volume of Site Images Utilizing an LLM and Maps
Technology which interactively assesses disaster situations

Featured Technologies

August 25, 2023

Increased attention is being paid to strengthening preparedness for natural disasters in response to a rise in extreme rainfall events and large-scale earthquakes. In particular, accurately assessing disaster situations and quickly mounting the proper rescue response is an important issue at the national and municipal level. Amidst such circumstances, NEC developed a technology which efficiently assesses disaster situations utilizing AI. We spoke with several researchers about the details of this technology which utilizes a large language model (LLM) and map data.

Visual Intelligence Research Laboratories
Director
Masahiro Tani

Visual Intelligence Research Laboratories
Director
Makoto Terao

Visual Intelligence Research Laboratories
Senior Principal Researcher
Takashi Shibata

Visual Intelligence Research Laboratories
Researcher
Naoya Sogi

Accurately assessing a disaster situation from images to accelerate evacuation guidance and rescue efforts

Visual Intelligence Research Laboratories
Director
Masahiro Tani

― Please describe this technology for interactively assessing disaster situations.

Tani: When a natural disaster occurs, this technology enables the government and local municipalities to assess the disaster situation quickly and accurately and provide a rapid initial response. It quickly discovers the images which are needed to assess the disaster situation from among the large volume of on-the-scene images that are collected minute by minute when a disaster occurs while also being able to plot the images on a map with accuracy down to the street address level.


Terao: When finding the necessary images from among the large volume of on-the-scene images, it is important to be able to handle various surveys according to the disaster situation. Therefore, we made it possible to find any object or situation using free keywords without prior training by utilizing the ability of an LLM to interpret the meaning of words. In addition, we also made it possible to select the images according to the user's intent by enabling the entry of instructions such as the images which you were/were not searching for within the obtained results.


Tani: For our technology, which estimates the location where images were taken, we further developed the technology announced in the February 2022 press release "NEC Develops Technology Capable of Estimating the Locations of Landscape Images From Satellite Imagery and Aerial Photographs" to increase the estimation accuracy in disaster scenarios. In addition to satellite imagery and aerial photographs taken from above, we can now estimate with even higher accuracy the location where disaster site images provided by residents were taken by newly utilizing map data. Because on-the-scene images taken during an emergency such as a disaster do not necessarily contain location information, accurately estimating the location from images is extremely meaningful for proceeding with rescue activities. Moreover, the images can be plotted on a map by linking them with map data, so it is possible to intuitively assess the disaster situation.


Tani: In recent years, severe rainstorms and flooding, etc. have occurred in locations across the country, and it is said that natural disasters are becoming more intense. Amid fears that an earthquake may occur on the Nankai Trough, it is extremely important to strengthen disaster measures to accommodate this situation. We have been developing this technology in response to consultations and requests from various customers, including local governments.


Sogi: I think it is rather difficult to recall what the use of this technology looks like, so please see this actual demo. The screen on the left provides an interactive interface for users to filter and refine their image search while the screen on the right side shows these filtered images plotted on a map. Because this demo has already loaded the images into the system database and estimated the location information, it is ready to plot the images on the map.

If, for example, you enter "search for images of City A" in the chat here, you can refine the results to include only images of City A. Furthermore, if you then enter "find collapsed buildings" from this point, the AI will filter the images to only those it considers relevant. However, there are various aspects to simply saying "collapsed," so I think that it is rather difficult to completely explain the user's request in words. By selecting images which match request from among the refined images here, the system executes a further refinement which reflects the user's request. For example, let us focus on images which show buildings with significant damage and tell the system that "images one, two, and three are good." When we do that, the system searches only for images with similar depictions of collapse and refines the image result. These can also be displayed on the map on the right side, so it enables you to intuitively assess the disaster situation across a large area.

Image source: Okatani Labs. of Tohoku University

Tani: At a disaster site where there is a risk that traffic is disrupted, one cannot carelessly go there for rescue operations. We believe that quickly and accurately assessing the disaster situation and the location remotely from the beginning is important for providing a rapid and appropriate initial response.

Capable of searching even without prior training

Visual Intelligence Research Laboratories
Director
Makoto Terao

― What are the mechanisms used to realize this technology?

Terao: Two important points are the fact that it combines an LLM as mentioned a moment ago and the linking with the map data. The fact that it can find any object or scene without prior training by utilizing an LLM is a significant advantage. Conventional technologies were only able to search for people, cars, and other anticipated targets that the AI had been trained on in advance. However, this new technology utilizing an LLM can even search for targets that the AI has not been trained on in advance. For example, it is now able to flexibly search based on the words entered by a user such as searching for a traffic light that has fallen down, finding flooded locations, and discovering fallen trees, etc.


Shibata: Simply put, when it comes to the damage caused by disasters, various situations can occur. If you take earthquakes, for example, a wide variety situations can arise such as buildings collapsing, roads sinking, and flooding. It is difficult to train the AI on all of these situations in advance. Training an AI to handle unknown situations which do not exist in the training data is called "zero-shot learning," and we utilized an LLM this time to achieve this zero-shot approach. The LLM is packed with the kind of common sense that we humans possess in words such as "a traffic light is this type of object" or "flooding is this kind of event," so it is effective at coarsely refining the results for any subject within a massive number of images.

Nonetheless, performing a detailed refinement of the results which reflects the user's intent using only natural language is challenging. For example, when we humans try to search for someone's face among a large number of people, we might say, "they have raised eyebrows and short hair," but it is difficult for us to express everything in elaborate detail with just words. If we instead say, "they look like so and so," then we should be able to immediately find them. Images as such contain an abundant amount of information which differs from natural language. Therefore, we were able to achieve an even higher level of refinement this time by embedding a technology which searches for similar images. By specifying in the chat which of the images picked up by the LLM do or do not match the conditions, it has become possible to reflect the user's intent more easily.

Visual Intelligence Research Laboratories
Senior Principal Researcher
Takashi Shibata

Achieving the world's highest accuracy (*1) utilizing map data

Tani: Regarding the second point concerning the link with the map data, it has been useful for improving the accuracy of the photograph location estimation. In our previous technology, we matched photos taken on the ground with satellite imagery and aerial photographs, but by newly utilizing the map data layout information this time, we can now clearly see where the roads and buildings are and whether there are traffic lights, etc. As a result, we achieved the world's highest (*1) matching accuracy of 94.4% (*2).

In addition, there is one more advantage to using the map data layout information. For example, when an earthquake occurs, it is thought that roads have a lower risk of collapsing than buildings, so actively using the road information for estimation in an earthquake would likely increase the accuracy. Conversely, there is a high probability that roads will be flooded in the event of flood damage, so actively utilizing building information should increase the accuracy. By separating and matching the data on the map by the order of priority in this way, the system can robustly handle a variety of natural disasters.

The details of this technology were summarized in a research paper which was accepted(*3) and orally presented at ICASSP2023, a major international conference in the signal processing field.


Shibata: Technologies which combine different forms of media such as images and texts or texts and audio to process them in a multi-modal manner are not uncommon, but I think that combining images and map data is a globally unique approach. Perhaps it can be said that we arrived at this novel idea for the very reason that NEC aims to solve social problems.

Targeting practical application at national and municipal levels

Visual Intelligence Research Laboratories
Researcher
Naoya Sogi

― Please tell us about the future prospects for this technology.

Tani: First, we plan to run a series of demonstration experiments with the cooperation of government and municipal customers. We will start demonstrations this fiscal year and continuing into the next fiscal year to clarify the performance required on site with the goal of reaching practical application about two years later. In time, we expect to be able to deploy services, etc. for expediting the assessment of disaster situations not only to public institutions but also to non-life insurance companies.


Terao: On a technical level, we are also thinking about an even tighter form of linking with the map information. There is a lot of information on the maps. If we can link this information with the LLM, it could further increase the accuracy and expand the potential of this technology. We plan to continue researching and developing this as a technology that adapts to intensifying disasters and contributes to situation assessment and rescue activities.

With this technology, which interactively assesses disaster situations, it is possible to rapidly narrow down the essential images from a vast collection of disaster-related images using a chat interface. Furthermore, it can estimate with the world's highest accuracy(*1) by matching the location where images were taken on the ground with satellite imagery and map data and plot the locations on a map.

In addition to providing support without prior learning by combining an LLM, NEC has realized a globally unprecedented disaster countermeasure technology through a unique approach which combines image and map data.

  • The information posted on this page is the information at the time of publication.