The neurosuit uses brain activity to decode speech
summary: A newly developed machine learning model can predict the words he is about to speak based on his neural activity recorded by a minimally invasive neuroprosthesis device.
Researchers from HSE University and Moscow State Medical and Dental University have developed a machine learning model that can predict which word will be spoken by a subject based on their neural activity recorded with a small array of minimally invasive electrodes.
The paper ‘Decoding speech from a small array of spatially isolated minimally invasive EEG electrodes with a compact and interpretable neural network’ is published in Journal of Neural Engineering. The research was funded by a grant from the Russian government as part of the national project “Science and Universities”.
Millions of people around the world are affected by speech disorders that limit their ability to communicate. The causes of speech loss can vary and include stroke and some congenital conditions.
Technology is available today to restore communication function in these patients, including “silent speech” interfaces that recognize speech by tracking the movement of articulatory muscles as a person pronounces words without making a sound. However, such devices help some patients but not others, such as people with facial muscle paralysis.
Speech neuroprostheses — brain-computer interfaces capable of decoding speech based on brain activity — could provide an accessible and reliable solution for restoring communication for such patients.
Unlike personal computers, devices with a brain-computer interface (BCI) are controlled directly by the brain without the need for a keyboard or microphone.
A major barrier to the wider use of BCIs in speech prostheses is that this technique requires highly invasive surgery to implant electrodes into brain tissue.
More accurate speech recognition is achieved by means of prostheses with electrodes that cover a large area of the cortical surface. However, these solutions for reading brain activity are not intended for long-term use and pose significant risks to patients.
Researchers at the HSE Center for Bioelectrical Interfaces and Moscow State University of Medicine and Dentistry have studied the possibility of creating a functional neuroprosthesis capable of decoding speech with acceptable accuracy by reading brain activity from a small array of electrodes implanted in a limited cortical area.
The authors suggest that, in the future, this minimally invasive procedure could be performed under local anesthesia. In this study, the researchers collected data from two patients with epilepsy who had already been implanted with intracranial electrodes for the purpose of preoperative mapping to identify areas of seizure initiation.
The first patient was implanted bilaterally with a total of five sEEG columns with six contacts in each, and the second patient was implanted with nine electrocardiogram (ECoG) strips with eight contacts in each.
In contrast to ECoG, electrodes for sEEG can be implanted without a full craniotomy via a hole drilled into the skull. In this study, only six contacts of a single sEEG column in one patient and the eight contacts of an ECoG tape in the other were used to decode neural activity.
The subjects were asked to read out loud six sentences, each of which was presented 30 to 60 times in random order. The sentences differed in structure, and the majority of words in one sentence begin with the same letter. The sentences contained a total of 26 different words. As the subjects read, the electrodes recorded their brain activity.
This data was then aligned with the audio signals to form 27 categories, including 26 words and one silence category. The resulting training dataset (containing signals recorded in the first 40 min of the experiment) was fed into a machine learning model with a neural network-based architecture.
The learning task of the neural network was to predict the next spoken word (category) based on the neural activity data that preceded its utterance.
In designing the neural network architecture, the researchers wanted to make it simple, compact, and easy to interpret. They came up with a two-stage architecture that first extracted internal representations of speech from recorded brain activity data, producing logarithmic spectral coefficients, and then predicted a specific category, i.e. word or silence.
Thus, the neural network, which was trained, achieved an accuracy of 55% using only six channels of data recorded by a single sEEG electrode in the first patient and a 70% accuracy using only eight channels of data recorded by a single ECoG tape in the second patient. This accuracy is comparable to that shown in other studies using devices that require implantation of electrodes over the entire cortical surface.
The resulting interpretable model allows it to explain in neurophysiological terms which neural information contributes most to predicting a word about to be spoken.
The researchers examined the signals coming from different neuronal groups to determine which ones were pivotal to the downstream task.
Their findings were consistent with those of speech mapping, indicating that the model uses axonal signals and can therefore be used to decode imaginative speech.
Another advantage of this solution is that it does not require manual feature engineering. The model learned to extract speech representations directly from the brain activity data.
The interpretability of the results also indicates that the network is decoding signals from the brain rather than any accompanying activity, such as electrical signals from articular muscles or arising due to the microphone effect.
The researchers stress that the prediction was always based on pre-speech neural activity data. They argue that this ensures that the decision rule does not use the auditory cortex’s response to utterances that have already been spoken.
“Using such interfaces carries little risk for the patient. If everything works out, it may be possible to decode phantom speech from neural activity recorded by a small number of minimally invasive electrodes implanted on an outpatient basis using local anesthesia,” – Alexey Osachi, The study’s lead author, director of the Center for Bioelectrical Interfaces at the HSE Institute for Cognitive Neuroscience.
About this search for neurotechnology news
author: Ksenia Brigadze
Contact: Ksenia Brigadze – HSE
picture: The image is in the public domain
Original search: Closed access.
“Minimally invasive decoding of speech from a small array of spatially isolated EEG electrodes with a built-in, interpretable neural networkBy Alexey Ossadtchi et al. Journal of Neural Engineering
Minimally invasive decoding of speech from a small array of spatially isolated EEG electrodes with a built-in, interpretable neural network
objective. Speech decoding, one of the most interesting brain-computer interface applications, opens up abundant opportunities from rehabilitation of patients to direct and seamless communication between human species. Typical solutions are based on invasive recordings with a large number of distributed electrodes implanted through a craniotomy. Here we explored the possibility of creating a speech prosthesis in a minimally invasive environment with a small number of spatially separated intracranial electrodes.
Approaching. We collected 1 hour of data (from two sessions) in two patients who had been implanted with invasive electrodes. We then used only the sEEG or ECoG bar contacts to decode neural activity into 26 words and 1 silence category. We used a compact convolutional network-based architecture whose spatial and temporal filter weights allow physiologically plausible interpretation.
the basic Results. We achieved an average of 55% accuracy using just six channels of data recorded with a single minimally invasive sEEG electrode in the first patient and 70% accuracy using only eight channels of data recorded for a single ECoG strip in the second patient in a 26+1 word pronunciation classification in public. Our combined architecture did not require the use of pre-designed features, learned quickly and resulted in a stable, interpretable and physiologically meaningful decision base that successfully operates on a contiguous dataset collected over a different time period than that used for training. The spatial properties of axonal neural groups support the results of active and passive speech mapping and show an inverse relationship between the spatial frequency of neural activity. Compared to other constructs, our compact solution performed at par or better than those that have recently appeared in the neural speech decoding literature.
indication. We show the possibility of constructing a speech prosthesis with a small number of electrodes and based on a compact feature engineering decoder derived from a small amount of training data.