Breadcrumb navigation

FireDucks significantly reduces table data processing time
Application example: Spicy MINT at Toyota Technical Development Corporation

Featured Technologies

October 19, 2023

Waves of data utilization spread across the globe. At the actual sites of data analyses where data scientists work, the computers’ processing time remained a huge burden. In particular, preparation such as data cleansing is said to take up to almost 45% of data scientists’ work time*1 and efficiency was in high demand.

To address this need, NEC developed FireDucks, data analysis software that automatically speeds up data preparation. It is software infused with the knowledge of parallel processing and acceleration that NEC is known for, being a company that has engaged in the development of vector supercomputers and compilers. FireDucks has API compatibility with pandas, an OSS that many data scientists use, and its beta version is planned to be released free of charge as a Python library.

For technical PoC prior to release, we had the cooperation of Toyota Technical Development Corporation, which we hear is seeing a significant effect in the three months following introduction. We interviewed the AI and Data Science Technology Department employees and NEC developers who actually took part in the demonstrations, about the effects and application of FireDucks.

Toyota Technical Development Corporation
Platform Development Division
AI & Data Science Technology Department
Shotaro Noguchi

Toyota Technical Development Corporation
Platform Development Division
AI & Data Science Technology Department
Manager
Shunsuke Yamamoto

NEC
Digital Technology Development Laboratories
Kazuhisa Ishizaka

NEC
Digital Technology Development Laboratories
Takuya Araki

Introducing the use of FireDucks with a proprietary data analysis tool

― What operations did you use FireDucks for?

Yamamoto: As a part of the Toyota Group, we are working on "IP business" and "measurement and simulation business". In the measurement and simulation segment, Mr. Noguchi and I belong to a team that handles data science and are engaged in utilizing AI for optimizing vehicle development and streamlining tests, as well as anomaly detection of equipment. In recent years, AI-driven data analysis is challenging analysis human behavior and athletes’ motions, not only vehicle.


Noguchi: Our team is developing software called new windowSpicy MINT<EN ver of the site to be released at the end of 2023>. This is an engineering tool that uses AI to lead what to focus on in the data. We are also using this tool for our analysis and brushing it up. This tool brings "new insights" to data scientists and supports people troubled with the similar cases.

While we do various analyses using this tool, we were facing a big challenge that data preprocessing takes too long. It was just at that time when we saw the presentation of FireDucks at a data engineering conference held in March 2023. As that was the technology we needed, we asked NEC if we could use FireDucks for Spicy MINT.


Ishizaka: I was the one who attended to their initial approach. So, I immediately set up our first meeting, where we came to an agreement for their cooperation in a technical PoC in June.


Noguchi: We met at the conference in early March, so things went very fast.

Toyota Technical Development Corporation
Platform Development Division
AI & Data Science Technology Department
Manager
Shunsuke Yamamoto

Shortening preparation time by 60%

― How was it when you actually applied FireDucks?

Noguchi: It considerably changed the way we do data analysis. There were specifically three benefits: outstanding reduction of processing time; low-spec computers can be run; and existing scripts can be applied smoothly.

About processing time, we were able to cut down almost 60% of this time. This had a great impact. It completely changed the way we work. Previously, we were necessary to devise a way to reserve and execute any time-consuming processing for the night after leaving work and weekends. But after introducing FireDucks, we have been able to concentrate on data analyses without such a devise. Now we can turn off the computers every night after leaving work and weekends.

Toyota Technical Development Corporation
Platform Development Division
AI & Data Science Technology Department
Shotaro Noguchi

Data analyses becomes possible with home gaming computers

Noguchi: As for running FireDucks on low-spec computers, we have confirmed that analyses can be completed with practical performance even on gaming computers at electronics retail stores (less than 200,000 yen). This means that FireDucks can not only reduce processing time, but also save power consumption, which is conducive to reducing CO2 emissions. Furthermore, because high-spec computers are not necessary, it will be a driving force to expand the data analysis.


Ishizaka: I agree. The emergence of LLM is rapidly advancing the trend for generating pandas programs from natural language. The bar for creating programs will be lower, and the data analysis base will keep expanding. This means that FireDucks, which is compatible with pandas, will also see more use. In coordination with such trends, we would like to firmly contribute to the democratization of data science.

NEC
Digital Technology Development Laboratories
Kazuhisa Ishizaka

Takes as little as 30 minutes for installation

Noguchi: The compatibility with pandas is very helpful. This is the third benefit mentioned earlier. Spicy MINT was mostly completed two years ago, so tinkering with it to apply FireDucks was thought to take significant time and effort. However, FireDucks applying took less than 30 minutes , including the entire testing.


Yamamoto: While there are many software tools that contribute to higher speed, I think there are only a handful that can be used with Python. Some may function when run on a supercomputer, but even that will be too much hassle due to the need to change the programming language.On the other hand, it was a great advantage that we were able to achieve higher speed rarely changing from pandas.


Araki: Having compatibility was a point that Ishizaka was adamant about.


Ishizaka: Yes. I knew well that compatibility is a factor that makes many customers happy, so I tried hard to make this happen. Once you start, you just can't help paying meticulous attention to detail. I made numerous adjustments in order to improve compatibility.

NEC
Digital Technology Development Laboratories
Takuya Araki

Aiming for application to cloud and edge computing

― What do you want from FireDucks in the future?

Noguchi: Currently, it is offered on-premise, so it would be nice if it becomes available on cloud. Cloud is measured rate, so any faster processing will directly reduce costs.


Ishizaka: Yes. We also acknowledge that as an important factor, so we want to clear the constraints to realize it as soon as possible.


Noguchi: As another application, supporting use in edge environments would be appreciated. We frequently visit the actual sites of vehicle development. Currently, when we measure and analyze vehicle data, we have no choice but to bring back the data to the office since data processing takes too much time. By using an edge environment, we can run analysis on site and have discussions and implement countermeasures on site as well.

Moreover, if having an analysis condition provided as one ready-built package would lower the bar for application. I see many cases where people try to analyze data in the course of work, but stumble when building a working condition. As a result, they may develop a dislike of data analysis, which is a pity.

Also, I would assume there is a demand for laptop computers packaged with such a condition, which NEC may realize.


Ishizaka: I see. We can implement laptop computers with multicore CPUs, so there is enough potential for higher speed. Please allow us to consider this while we hear out details of the environment.


Araki: Yes. It will be a great pleasure to have more opportunities for a variety of users to actually use FireDucks. While we take pride in having deep insights into accelerated processing, the needs people have on site are varied and we do not have a grip on everything. By listening to our customers’ voices, we would like to continue developing technologies and services fitted to the actual sites.

Technical Description: FireDucks
Automatically accelerates Python library pandas programs

Python is a programming language used in AI research and development as well as data analyses. The library “pandas,” used for analyzing table data, is a major library that has approximately 1.2 billion downloads each year. Nevertheless, pandas runs on single thread, which prevented the multicore CPU from delivering its anticipated performance. As a result, the time spent on data preparation, including data cleansing, strained data scientists’ work hours, is currently a bottleneck for improving productivity.

On the other hand, while parallel processing, element processing, processing procedures, and other studies for faster programs are being conducted worldwide, such studies require high costs due to the need for expert knowledge, which is keeping them from expanding further.

FireDucks, developed by NEC, is software that addresses this issue and can automatically speed up pandas programs.

For many years, NEC has engaged in the development of vector supercomputers and compliers that optimize processing, exhibiting presence at international conferences themed on high-speed processing. FireDucks is infused with an abundance of NEC experts’ cutting-edge insights.

How FireDucks Accelerate
Optimizations by the Compiler

Feature 1: An average 5-fold increase in speed

FireDucks automatically converts programs written by data scientists to programs of a level as if created by high-speed specialists. On average, it provides a 5-fold acceleration in computing performance, which is estimated to reduce data scientists’ work hours to approximately 66%.

5x speedup vs pandas in average

Feature 2: Contributes to reductions in power consumption and CO2 emissions

The shorter processing time results in significant reduction in the power consumption of the computer. The optimized processing enables computing on lower-spec computers, doing without the use of supercomputers, which tend to consume a lot of power. The shorter processing time results in significant reduction in the power consumption of the computer, contributing to cutting back CO2 emissions.

Feature 3: API compatibility with pandas

When applying FireDucks, there is no need to rewrite programs to another language, rework programs, or do additional machine teaching. Thanks to its compatibility with pandas, it can run existing pandas programs. This feature makes for smooth introduction.

  • *
    Internal test results based on the TPCx-BB benchmark
  • The information posted on this page is the information at the time of publication.