Posts by Collection

patents

Optimizing training sets used for setting up inspection-related algorithms

Methods and systems for training an inspection-related algorithm are provided. One system includes one or more computer subsystems configured for performing an initial training of an inspection-related algorithm with a labeled set of defects thereby generating an initial version of the inspection-related algorithm and applying the initial version of the inspection-related algorithm to an unlabeled set of defects. Read more

Data sampling using Locality Sensitive Hashing for large scale graph learning

An important step in graph-based data analysis and processing is the construction of similarity graphs. Recent works have focused on the semi-supervised setting to learn an optimal similarity function for constructing a task-optimal graph. However, in many scenarios with billions of data points and trillions of potential edges, the run-time and computational requirements for training the similarity model make these approaches impractical. We present an efficient sampling approach by taking an adaptive partition view of locality sensitive hashing. Theoretically, we show that, though the samples obtained are correlated with sampling probabilities that do not sum to one, the training loss estimated for learning the graph similarity model using our approach is unbiased with a smaller variance compared to random sampling. Read more

Fine-tuning machine learning models while retraining accumulated knowledge

Techniques are described herein for a method of obtaining a neural network trained using a first dataset to perform a first task. The method further comprises iteratively updating one or more weights of the neural network during a training phase. The training phase is used to teach the neural network to perform a second task using a second dataset. The one or more weights of the neural network are updated each iteration using a first projection matrix, a gradient of the one or more weights with respect to a loss function, and a second projection matrix. The method further comprises responsive to completion of the training phase, obtaining the neural network trained to perform the first task and the second task. Read more

Domain aware large language model governance

Techniques are described herein for a method of decreasing the likelihood of out-of-domain LLM responses. The method includes determining, by a block of a LLM, a representation of the text input. The method further includes determining a set of coefficients based at least on a reconstruction of the text input using a dictionary and the representation of the text input. The method further includes performing a sparsity check using the set of coefficients. The method further includes generating a response to the text input based at least on the sparsity check. Read more

Training a target activation sparsity in a neural network

Techniques are described herein for a method of training a target activation sparsity in a neural network. The method includes obtaining a nonlinear portion of a plurality of neurons in a neural network. The neural network is trained to perform a target task. The method further includes substituting the nonlinear portion for a dynamic nonlinear portion in the plurality of neurons in the neural network. The dynamic nonlinear portion is trained to activate or deactivate one or more neurons of the plurality of neurons. The method further includes retraining the neural network using a first loss function that minimizes a loss of the target task and second loss function that minimizes a number of active neurons. Read more

Machine learning model compression

Techniques are described herein for a method of compression in large language model. The method includes determining blocks in LLM that have redundancies that are collapsible. The method further includes performing modification to the language model without retraining of the model. Read more

Knowledge base for voice large language model applications

A voice-based agent service provides one or more voice agents to respond to voice-based requests. For example, when a question is recieved by the voice-based agent service it can be matched to a voice-ready knowledge base. The voice-ready knowledge base is obtained offline and organized as nodes in a conversation graph. The question can be matched to a node and then subsequent nodes can be predicted by a conversation model based on previously traveresed nodes and further input in realtime. Read more

Gradient-free optimization of large language models

Gradient-free optimization of lanugage models is performed by iteratively improving the context instruction to perform a given task. This is obtained by evaluating with reference data, criteria and making use of the reasoning as signal to improve a language model until an end condition is met. Read more

publications

Detection and removal of Salt and Pepper noise in images by improved median filter

Published in IEEE Recent Advances in Intelligent Computational Systems, 2011

A methodology based on median filters for the removal of Salt and Pepper noise by its detection followed by filtering in both binary and gray level images has been proposed in this paper. Read more

Neighborhood and Graph Constructions using Non-Negative Kernel Regression

Published in arXiv, 2019

Data driven graph constructions are often used in various applications, including several machine learning tasks, where the goal is to make predictions and discover patterns. However, learning an optimal graph from data is still a challenging task. Weighted K-nearest neighbor and ϵ-neighborhood methods are among the most common graph construction methods, due to their computational simplicity but the choice of parameters such as K and ϵ associated with these methods is often ad hoc and lacks a clear interpretation. Read more

Graph Construction from Data by Non-Negative Kernel Regression

Published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020

Data driven graph constructions are often used in machine learning applications. However, learning an optimal graph from data is still a challenging task. K-nearest neighbor and ϵ-neighborhood methods are among the most common graph construction methods, due to their computational simplicity, but the choice of parameters such as K and ϵ associated with these methods is often ad hoc and lacks a clear interpretation. Read more

DeepNNK: Explaining deep models and their generalization using polytope interpolation

Published in arXiv Preprints, 2020

Modern machine learning systems based on neural networks have shown great success in learning complex data patterns while being able to make good predictions on unseen data points. However, the limited interpretability of these systems hinders further progress and application to several domains in the real world. Read more

Graph-based Deep Learning Analysis and Instance Selection

Published in IEEE International Workshop on Multimedia Signal Processing (MMSP), 2020

While deep learning is a powerful tool for manyapplications, there has been only limited research about selectionof data for training, i.e., instance selection, which enhances deeplearning scalability by saving computational resources. Read more

Efficient graph construction for image representation

Best student paper Published in IEEE International Conference on Image Processing (ICIP), 2020

Graphs are useful to interpret widely used image processing methods, e.g., bilateral filtering, or to develop new ones, e.g., kernel based techniques. However, simple graph constructions are often used, where edge weight and connectivity depend on a few parameters. In particular, the sparsity of the graph is determined by the choice of a window size. Read more

Revisiting local neighborhood methods in machine learning

Published in IEEE Data Science and Learning Workshop (DSLW), 2021

Several machine learning methods leverage the idea of locality by using $k$-nearest neighbor (KNN) techniques to design better pattern recognition models. However, the choice of KNN parameters such as $k$ is often made experimentally, e.g., via cross-validation, leading to local neighborhoods without a clear geometric interpretation. Read more

Model selection and explainability in neural networks using a polytope interpolation framework

Published in Asilomar Conference on Signals, Systems, and Computers, 2021

Modern machine learning systems based on neural networks have shown great success in learning complex data patterns while being able to make good predictions on unseen data points. However, the limited understanding of these systems hinders further progress and application to several domains in the real world. Read more

Channel-Wise Early Stopping without a ValidationSet via NNK Polytope Interpolation

Published in Asia Pacific Signal and Information Processing Association (APSIPA), 2021

Convolutional neural networks (ConvNets) comprise high-dimensional feature spaces formed by the aggregation of multiple channels, where analyzing intermediate data representations and the model's evolution can be challenging owing to the curse of dimensionality. We present channel-wise DeepNNK (CW-DeepNNK) Read more

Channel redundancy and overlap in convolutional neural networks with Channel-wise NNK graphs

Published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022

Feature spaces in the deep layers of convolutional neural networks (CNNs) are often very high-dimensional and difficult to interpret. However, convolutional layers consist of multiple channels that are activated by different types of inputs, which suggests that more insights may be gained by studying the channels Read more

NNK-Means: Data summarization using dictionary learning with non-negative kernel regression

Published in IEEE 30th European Signal Processing Conference (EUSIPCO), 2022

An increasing number of systems are being designed by first gathering significant amounts of data, and then optimizing the system parameters directly using the obtained data. Often this is done without analyzing the dataset structure. Read more

The geometry of self-supervised learning models and its impact on Transfer learning

Published in arXiv Preprints, 2022

The recent popularity of SSL has led to the development of several models that make use of diverse training strategies, architectures, and data augmentation policies with no existing unified framework to study or assess their effectiveness in transfer learning. Read more

Study of Manifold Geometry using Multiscale Non-Negative Kernel Graphs

Published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022

Modern machine learning systems are increasingly trained on large amounts of data embedded in high-dimensional spaces. Often this is done without analyzing the structure of the dataset. In this work, we propose a framework to study the geometric structure of the data. Read more

A data-driven graph framework for geometric understanding of deep learning

Published in Graph Signal Processing Workshop 2023, 2023

Deep learning approaches have achieved unprecedented performance success in many application domains. In this work, we first present an empirical framework for studying deep learning by characterizing the geometry of the data manifold in the embedding spaces, using computationally efficient graph-based methods to learn manifold properties. Read more

Data Sampling using Locality Sensitive Hashing for Large Scale Graph Learning

Published in Mining and Learning with Graphs, Knowledge Discovery and Data Mining (KDD), 2023

Recent works, such as GRALE, have focused on the semi-supervised setting to learn an optimal similarity function for constructing a task-optimal graph. However, in many scenarios with billions of data points and trillions of potential edges, the run-time and computational requirements for training the similarity model make these approaches impractical. In this work, we consider data sampling as a means to overcome this issue. Read more

Towards a geometric understanding of Spatio Temporal Graph Convolution Networks

Published in IEEE Open Journal of Signal Processing, 2024

Spatio-temporal graph convolutional networks (STGCNs) have emerged as a desirable model for many applications including skeleton-based human action recognition. Despite achieving state-of-the-art performance, our limited understanding of the representations learned by these models hinders their application in critical and real-world settings. Read more

Characterizing Large Language Model Geometry Solves Toxicity Detection and Generation

Published in International Conference on Machine Learning (ICML), 2024

Large Language Models~(LLMs) drive current AI breakthroughs despite very little being known about their internal representations, e.g., how to extract a few informative features to solve various downstream tasks. To provide a practical and principled answer, we propose to characterize LLMs from a geometric perspective. Read more

Reasoning in Large Language Models: A Geometric Perspective

Published in arXiv Preprints, 2024

The advancement of large language models (LLMs) for real-world applications hinges critically on enhancing their reasoning capabilities. In this work, we explore the reasoning abilities of large language models (LLMs) through their geometrical understanding. Read more

AGI Is Coming… Right After AI Learns to Play Wordle

Published in arXiv Preprints, 2025

This paper investigates multimodal agents, in particular, OpenAI's Computer-User Agent (CUA), trained to control and complete tasks through a standard computer interface, similar to humans. Read more

Sarath Shekkizhar, Ph.D.

Posts by Collection

patents

publications