Physics in Video Language Models
In this post, we explore the physics behind text to video language models and how they can be used to generate realistic videos from text prompts. Read more
In a typical graph learning problem, we are given N data points and the goal is to learn an efficient or sparse graph representation of the data. The key word here is efficient: An efficient graph can be defined as one with number of edges of the same order as the number of nodes. If not for efficiency, the problem can be trivially solved by connecting each data to every other data point with edge weights proportional to the distance/similarity between them. A sparse graph construction leads to faster downstream processing and allows for better understanding of the local neighborhood structure of the data.
Most popular graph construction methods to solve this problem are the k-nearest neighbor (KNN) and ∈-neighborhood graphs (∈-graph). In these methods, one connects each data point based on a predefined similarity metric to its k most similar neighbors or neighbors that are atleast ∈-similar to the data. However, the choice of parameters such as k and ∈, which explicitly define the sparsity and connectivity of the graph, are unknown and are often assigned values in an adhoc manner or via cross-validation leading to graphs that are not robust. Also, in cases it is not clear geometrically what the choice of these parameters amount to. In this post, I will try to convince you the sub-optimal nature of these standard methods and present possible improvements from a dictionary based data approximation perspective.
Consider a collection of signals/functions that are unit normalized to form the building blocks to your signal space - we would refer to this set as a dictionary and its elements as atoms. A dictionary is complete when its atoms can represent the entire space of signals and is redundant when the atoms are linearly dependent i.e., one where an atom can be represented by a linear combination of the remaining atoms. In general, we often work with dictionaries that are redundant.
Sparse signal recovery or approximation involves finidng the simplest possible explanation of the signal using a linear combination of the atoms in the dictionary. The ability of a dictionary to provide a sparse representation for the signal depends on how well the signal matches the characteristics of the atoms in the dictionary. This problem is NP-hard in general and various relations and greedy approaches have been proposed in the past. A naive, basic approach to this problem is to find correlation between the signal and the atoms in the dictionary and use all those atoms that are within a threshold for the representation. A better approach, matching pursuit (MP), involves a greedy iterative selection procedure. This method works by finding the atoms that are maximally correlated to the residual (the part of the signal which is not represented) at each iteration until the signal is fully represented or no improvement can be made. This method does not guarantee optimality but works reasonably well for most real world appilcations and is easily better than thresholding. A second approach, orthogonal matching pursuit (OMP), works very similar to MP with one additional step at each iteration involving orthogonalization or reweighting of the contributions of atoms selected so far. Improved variations over these algorithms do exist and involve the ability to add groups of atoms at each step (instead of one per iteration) and pruning several of the atoms as redundant from the selection.
KNN and ∈-neighborhood methods rely on a similarity measure obtained using a positive definite kernel (for e.g., Gaussian kernel) to select and weight the neighbors of a particular data point. These positive definite kernels corresponds to a transformation of data points to a possibly infinite dimensional space such that similarities are dot products in this transformed space. The dot product space associated with kernels are refferred to as Reproducing Kernel Hilbert Space (RKHS) has well established properties and applications but has not been sufficiently studied for the purpose of graph learning.
Under the distinction of data and transformed space (RKHS), for similarity kernels with maximum value 1
, the inner product can be viewed as projection of one node on another i.e at a node i,
An additional detail unique to our problem setting is the non-negativity of the coefficients of approximation - the edge weights. To account for this, we will take into consideration the sign of correlation/projection of the atoms for our selection. Now, a k-nearest neighbor procedure (equivalently ∈-neighborhood) procedure at a node i can be reduced to the following steps
Thus, a KNN or ∈-graph framework is a signal approximation method obtained using a thresholding technique with thresholds set either globally (∈) or defined by desired sparsity (k). In the context of sparse dictionary based representation, it is well-known that selection via thresholding is suboptimal in general and is optimal only when the dictionary is not redundant. As discussed earlier, there exist a number of alternative methods that are better suited for the problem of sparse signal approximation such as MP and OMP. A straightforward adaptation of MP/OMP algorithm for graph construction at a node i can be obtained as below.
2 - 4
until approximation is exact or no atom exists with positive correlation.The difference between MP and OMP is at step 4
where, in MP one would use the correlation as weight for the atom selected in 3
while in OMP one performs least squared fit with all selected atoms to reweigh the contribution of the atoms selected. In reality, with the non negativity constrain one can obtain OMP like results by avoiding iterative selection and performing least squares fit on k-nearest neighbor or ∈-neighborhood selection in a single step optimization - a procedure that we proposed in NNK graphs. A key benefit of NNK graphs is its robustness to parameters such as k and ∈. This is because the number of neighbors that are assigned non-zero weights is not predetermined and instead depends on the local geometry of the data as in figure below.
In this post, we looked at a sparse signal approximation perspective to conventional graph and neighborhood construction methods with possible improvements from this perspective. The area of sparse signal approximation is vast - NNK graphs merely scratch the surface when it comes to incorporating this huge area with that of graph learning. Several possible improvements and ideas from dictionary based representation can be adapted such as
Python and Matlab source code for NNK graph is available at github.com/STAC-USC/