We use cookies in order to improve the quality and usability of the HSE website. More information about the use of cookies is available here, and the regulations on processing personal data can be found here. By continuing to use the site, you hereby confirm that you have been informed of the use of cookies by the HSE website and agree with our rules for processing personal data. You may disable cookies in your browser settings.

  • A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Group and Shuffle: Researchers at HSE University and AIRI Accelerate Neural Network Fine-Tuning

Group and Shuffle: Researchers at HSE University and AIRI Accelerate Neural Network Fine-Tuning

© iStock

Researchers at HSE University and the AIRI Institute have proposed a method for quickly fine-tuning neural networks. Their approach involves processing data in groups and then optimally shuffling these groups to improve their interactions. The method outperforms alternatives in image generation and analysis, as well as in fine-tuning text models, all while requiring less memory and training time. The results have been presented at the NeurIPS 2024 Conference.

The larger the neural network, the more challenging it becomes to quickly adapt it to a new task. Retraining a model from scratch is a time-consuming and costly process. Therefore, developers seek cost-effective ways to adapt a model to a specific task while preserving the overall quality of the original.

One such approach is fine-tuning using orthogonal matrices, which, unlike other methods, preserve the essential features of the original model. Popular alternatives, such as block-diagonal or butterfly matrices, have drawbacks: they are either limited in scope or require extensive computations.

Researchers at the HSE Faculty of Computer Science and the AIRI Institute have proposed a new method of constructing matrices, which they call Group-and-Shuffle. Instead of working with all the data at once, they divide the parameters into small groups, process each group separately, and then shuffle them together. This structure is both flexible and efficient: it enables the model to adapt more precisely to the task while requiring fewer computations and less memory.

Building on GS matrices, the researchers developed GSOFT, a new method for orthogonal fine-tuning of neural networks. Unlike previous approaches, GSOFT uses fewer parameters while maintaining training stability and quality, even with limited data. The team also introduced a two-sided version of the method—Double GSOFT—which allows simultaneous adjustment of parameters from both sides, enhancing the model’s flexibility and accuracy.

'We discovered how to construct orthogonal matrices using only two special types of matrices, instead of five or six as required by previous methods. This saves computational resources and training time,' explains Nikolay Yudin, Research Assistant at the HSE Laboratory for Matrix and Tensor Methods in Machine Learning.

The researchers tested the approach on three types of tasks. When fine-tuning the RoBERTa language model, the method outperformed others while using a comparable number of parameters. In image generation, where the model needed to preserve the original features while adapting to the user’s request, GSOFT and Double GSOFT outperformed popular methods like LoRA and BOFT, all while using less memory and training time.

Subject-driven generation visual results on 3,000 training iterations
© Gorbunov, M., Yudin, N., Soboleva, V., Alanov, A., Naumov, A., Rakhuba, M. (2024). Group and shuffle: Efficient structured orthogonal parametrization. arXiv preprint arXiv:2406.10019.

The authors also tested their approach on convolutional neural networks, which are commonly used for image and video analysis, such as in face recognition. The team adapted the GS matrices even for cases where the model required strong resistance to interference and distortion.

'We tested the method across various scenarios—from language and generative models to robust convolutional networks. In every case, it performed reliably while using fewer resources. This confirms that the method can be applied effectively to a variety of purposes,' comments Aibek Alanov, Senior Research Fellow at the Centre of Deep Learning and Bayesian Methods, AI and Digital Science Institute, HSE FCS, and leader of the Controllable Generative AI team at FusionBrain, AIRI.

See also:

Scientists Develop Effective Microlasers as Small as a Speck of Dust

Researchers at HSE University–St Petersburg have discovered a way to create effective microlasers with diameters as small as 5 to 8 micrometres. They operate at room temperature, require no cooling, and can be integrated into microchips. The scientists relied on the whispering gallery effect to trap light and used buffer layers to reduce energy leakage and stress. This approach holds promise for integrating lasers into microchips, sensors, and quantum technologies. The study has been published in Technical Physics Letters.

HSE Scientists Test New Method to Investigate Mechanisms of New Word Acquisition

Researchers at the HSE Centre for Language and Brain were among the first to use transcranial alternating current stimulation to investigate whether it can influence the acquisition of new words. Although the authors of the experiment have not yet found a link between brain stimulation and word acquisition, they believe that adjusting the stimulation parameters may yield different results in the future. The study has been published in Language, Cognition and Neuroscience.

Twenty vs Ten: HSE Researcher Examines Origins of Numeral System in Lezgic Languages

It is commonly believed that the Lezgic languages spoken in Dagestan and Azerbaijan originally used a vigesimal numeral system, with the decimal system emerging later. However, a recent analysis of numerals in various dialects, conducted by linguist Maksim Melenchenko from HSE University, suggests that the opposite may be true: the decimal system was used originally, with the vigesimal system developing later. The study has been published in Folia Linguistica.

Scientists Rank Russian Regions by Climate Risk Levels

Researchers from HSE University and the Russian Academy of Sciences have assessed the levels of climate risks across Russian regions. Using five key climate risks—heatwaves, water stress, wildfires, extreme precipitation, and permafrost degradation—the scientists ranked the country’s regions according to their need for adaptation to climate change. Krasnoyarsk Krai, Irkutsk Region, and Sverdlovsk Region rank among the highest for four of the five climate risks considered. The study has been published in Science of the Total Environment.

HSE Researchers Teach Neural Network to Distinguish Origins from Genetically Similar Populations

Researchers from the AI and Digital Science Institute, HSE Faculty of Computer Science, have proposed a new approach based on advanced machine learning techniques to determine a person’s genetic origin with high accuracy. This method uses graph neural networks, which make it possible to distinguish even very closely related populations.

HSE Economists Reveal the Secret to Strong Families

Researchers from the HSE Faculty of Economic Sciences have examined the key factors behind lasting marriages. The findings show that having children is the primary factor contributing to marital stability, while for couples without children, a greater income gap between spouses is associated with a stronger union. This is the conclusion reported in Applied Econometrics.

Fifteen Minutes on Foot: How Post-Soviet Cities Manage Access to Essential Services

Researchers from HSE University and the Institute of Geography of the Russian Academy of Sciences analysed three major Russian cities to assess their alignment with the '15-minute city' concept—an urban design that ensures residents can easily access essential services and facilities within walking distance. Naberezhnye Chelny, where most residents live in Soviet-era microdistricts, demonstrated the highest levels of accessibility. In Krasnodar, fewer than half of residents can easily reach essential facilities on foot, and in Saratov, just over a third can. The article has been published in Regional Research of Russia.

HSE Researchers Find Counter-Strike Skins Outperform Bitcoin and Gold as Alternative Investments

Virtual knives, custom-painted machine guns, and gloves are common collectible items in videogames. A new study by scientists from HSE University suggests that digital skins from the popular video game Counter-Strike: Global Offensive (CS:GO) rank among the most profitable types of alternative investments, with average annual returns exceeding 40%. The study has been published in the Social Science Research Network (SSRN), a free-access online repository.

HSE Neurolinguists Reveal What Makes Apps Effective for Aphasia Rehabilitation

Scientists at the HSE Centre for Language and Brain have identified key factors that increase the effectiveness of mobile and computer-based applications for aphasia rehabilitation. These key factors include automated feedback, a variety of tasks within the application, extended treatment duration, and ongoing interaction between the user and the clinician. The article has been published in NeuroRehabilitation.

'Our Goal Is Not to Determine Which Version Is Correct but to Explore the Variability'

The International Linguistic Convergence Laboratory at the HSE Faculty of Humanities studies the processes of convergence among languages spoken in regions with mixed, multiethnic populations. Research conducted by linguists at HSE University contributes to understanding the history of language development and explores how languages are perceived and used in multilingual environments. George Moroz, head of the laboratory, shares more details in an interview with the HSE News Service.