Breakthrough into High Dimensions for Sparse Grid Regression

For the Sparse Grid classification and regression method, which has been developed in cooperation of the prudsys AG and the University of Bonn for eight years, now the breakthrough into an in principle unbounded number of dimensions has been achieved

(PresseBox) (Chemnitz, 11/21/2006)

In his new paper "A dimension adaptive sparse grid combination technique for machine learning" Jochen Garcke (TU Berlin), who has developed the central ideas along with Prof. Griebel (Uni Bonn) und Prof. Hegland (ANU), describes the application of dimension-adaptive methods for Sparse Grids in Data Mining.

The Sparse Grid technology represents one of the most ambitious approaches for the solution of classification and regression problems.
It is the first universal multivariate method which scales linearly with the number of data points and thus can be applied to huge data sets. The main idea is to solve classification and regression problems directly via their operator euqtions - usually in the form of differential equations - by discretizing the feature space. This approch, which has proved itself for physical problems over decades (usually in form of the Finite Element Method), until now has failed in Data Mining for reasons of computational complexity that increases exponentially with the number of dimensions ("curse of dimension").

Sparse Grids for the first time allow a discretization of high-dimensional function spaces and are especially used for solving high-dimensional integral and differential equations since the late 90s. Mathematically, Sparse Grid functions represent high-dimensional wavelets over an hierarchy of anisotropic grids. The adaption of Sparse Grids for classification and regression problems for the first time enables to apply highly nonlinear classification and regression methods to large data sets. Thus, it represents a qualitative improvement compared to conventional methods such as Neural and Bayesian networks or SVMs.

Over the last years, Sparse Grids have been successfully used in the prudsys DISCOVERER classification tool. However, so far the maximum number of dimensions was bounded by about 20. After years of fundamental research, for regression problems a convergent combination of the Sparse Grid technique with dimension-based adaptivity has been accomplished. At this, adaptive error estimators are used for automatic grid refinement along the required dimensions.
Now Sparse Grids arrive at 30-50 dimensions for millions of data points.

The adaptive Sparse Grid method will be included into the new version of the prudsys DISCOVERER next year.