Publications

You can also find my articles on my Google Scholar profile.

AGI Is Coming… Right After AI Learns to Play Wordle

Published in arXiv Preprints, 2025

This paper investigates multimodal agents, in particular, OpenAI's Computer-User Agent (CUA), trained to control and complete tasks through a standard computer interface, similar to humans. Read more

Reasoning in Large Language Models: A Geometric Perspective

Published in arXiv Preprints, 2024

The advancement of large language models (LLMs) for real-world applications hinges critically on enhancing their reasoning capabilities. In this work, we explore the reasoning abilities of large language models (LLMs) through their geometrical understanding. Read more

Characterizing Large Language Model Geometry Solves Toxicity Detection and Generation

Published in International Conference on Machine Learning (ICML), 2024

Large Language Models~(LLMs) drive current AI breakthroughs despite very little being known about their internal representations, e.g., how to extract a few informative features to solve various downstream tasks. To provide a practical and principled answer, we propose to characterize LLMs from a geometric perspective. Read more

Towards a geometric understanding of Spatio Temporal Graph Convolution Networks

Published in IEEE Open Journal of Signal Processing, 2024

Spatio-temporal graph convolutional networks (STGCNs) have emerged as a desirable model for many applications including skeleton-based human action recognition. Despite achieving state-of-the-art performance, our limited understanding of the representations learned by these models hinders their application in critical and real-world settings. Read more

Data Sampling using Locality Sensitive Hashing for Large Scale Graph Learning

Published in Mining and Learning with Graphs, Knowledge Discovery and Data Mining (KDD), 2023

Recent works, such as GRALE, have focused on the semi-supervised setting to learn an optimal similarity function for constructing a task-optimal graph. However, in many scenarios with billions of data points and trillions of potential edges, the run-time and computational requirements for training the similarity model make these approaches impractical. In this work, we consider data sampling as a means to overcome this issue. Read more

A data-driven graph framework for geometric understanding of deep learning

Published in Graph Signal Processing Workshop 2023, 2023

Deep learning approaches have achieved unprecedented performance success in many application domains. In this work, we first present an empirical framework for studying deep learning by characterizing the geometry of the data manifold in the embedding spaces, using computationally efficient graph-based methods to learn manifold properties. Read more

Study of Manifold Geometry using Multiscale Non-Negative Kernel Graphs

Published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022

Modern machine learning systems are increasingly trained on large amounts of data embedded in high-dimensional spaces. Often this is done without analyzing the structure of the dataset. In this work, we propose a framework to study the geometric structure of the data. Read more

The geometry of self-supervised learning models and its impact on Transfer learning

Published in arXiv Preprints, 2022

The recent popularity of SSL has led to the development of several models that make use of diverse training strategies, architectures, and data augmentation policies with no existing unified framework to study or assess their effectiveness in transfer learning. Read more

NNK-Means: Data summarization using dictionary learning with non-negative kernel regression

Published in IEEE 30th European Signal Processing Conference (EUSIPCO), 2022

An increasing number of systems are being designed by first gathering significant amounts of data, and then optimizing the system parameters directly using the obtained data. Often this is done without analyzing the dataset structure. Read more

Channel redundancy and overlap in convolutional neural networks with Channel-wise NNK graphs

Published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022

Feature spaces in the deep layers of convolutional neural networks (CNNs) are often very high-dimensional and difficult to interpret. However, convolutional layers consist of multiple channels that are activated by different types of inputs, which suggests that more insights may be gained by studying the channels Read more

Channel-Wise Early Stopping without a ValidationSet via NNK Polytope Interpolation

Published in Asia Pacific Signal and Information Processing Association (APSIPA), 2021

Convolutional neural networks (ConvNets) comprise high-dimensional feature spaces formed by the aggregation of multiple channels, where analyzing intermediate data representations and the model's evolution can be challenging owing to the curse of dimensionality. We present channel-wise DeepNNK (CW-DeepNNK) Read more

Model selection and explainability in neural networks using a polytope interpolation framework

Published in Asilomar Conference on Signals, Systems, and Computers, 2021

Modern machine learning systems based on neural networks have shown great success in learning complex data patterns while being able to make good predictions on unseen data points. However, the limited understanding of these systems hinders further progress and application to several domains in the real world. Read more

Revisiting local neighborhood methods in machine learning

Published in IEEE Data Science and Learning Workshop (DSLW), 2021

Several machine learning methods leverage the idea of locality by using $k$-nearest neighbor (KNN) techniques to design better pattern recognition models. However, the choice of KNN parameters such as $k$ is often made experimentally, e.g., via cross-validation, leading to local neighborhoods without a clear geometric interpretation. Read more

Efficient graph construction for image representation

Best student paper Published in IEEE International Conference on Image Processing (ICIP), 2020

Graphs are useful to interpret widely used image processing methods, e.g., bilateral filtering, or to develop new ones, e.g., kernel based techniques. However, simple graph constructions are often used, where edge weight and connectivity depend on a few parameters. In particular, the sparsity of the graph is determined by the choice of a window size. Read more

Graph-based Deep Learning Analysis and Instance Selection

Published in IEEE International Workshop on Multimedia Signal Processing (MMSP), 2020

While deep learning is a powerful tool for manyapplications, there has been only limited research about selectionof data for training, i.e., instance selection, which enhances deeplearning scalability by saving computational resources. Read more

DeepNNK: Explaining deep models and their generalization using polytope interpolation

Published in arXiv Preprints, 2020

Modern machine learning systems based on neural networks have shown great success in learning complex data patterns while being able to make good predictions on unseen data points. However, the limited interpretability of these systems hinders further progress and application to several domains in the real world. Read more

Graph Construction from Data by Non-Negative Kernel Regression

Published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020

Data driven graph constructions are often used in machine learning applications. However, learning an optimal graph from data is still a challenging task. K-nearest neighbor and ϵ-neighborhood methods are among the most common graph construction methods, due to their computational simplicity, but the choice of parameters such as K and ϵ associated with these methods is often ad hoc and lacks a clear interpretation. Read more

Neighborhood and Graph Constructions using Non-Negative Kernel Regression

Published in arXiv, 2019

Data driven graph constructions are often used in various applications, including several machine learning tasks, where the goal is to make predictions and discover patterns. However, learning an optimal graph from data is still a challenging task. Weighted K-nearest neighbor and ϵ-neighborhood methods are among the most common graph construction methods, due to their computational simplicity but the choice of parameters such as K and ϵ associated with these methods is often ad hoc and lacks a clear interpretation. Read more

Detection and removal of Salt and Pepper noise in images by improved median filter

Published in IEEE Recent Advances in Intelligent Computational Systems, 2011

A methodology based on median filters for the removal of Salt and Pepper noise by its detection followed by filtering in both binary and gray level images has been proposed in this paper. Read more

Sarath Shekkizhar, Ph.D.

Publications