Techniques are described herein for a method of compression in large language model. The method includes determining blocks in LLM that have redundancies that are collapsible. The method further includes performing modification to the language model without retraining of the model.