Diagnosing psychosis with word analysis

Editor’s note: This article is by Dr. Guillermo Cecchi of IBM Research’s Computational Biology Group. 

Analyzing the spoken words of people with mental health disorders could significantly improve the accuracy of diagnosing mania and schizophrenia. In a PLoS ONE paper, my Computational Biology team collaborating with researchers and clinicians in Brazil showed that quantifying and graphing only speech was 93 percent accurate in identifying these cases of psychosis. 

This collaboration with professionals across medical, neuroscience, and technical departments at Brazil’s Federal University, and Universidade de Sao Paulo was the first time that psychiatric differential diagnosis was implemented directly from speech analysis. In other words, our study, Speech Graphs Provide a Quantitative Measure of Thought Disorder in Psychosis, was the first to relate thought disorder with mathematical structures – graphs.

Word graph
We transcribe the speech to text,
and create graphs in which nodes
denote words, and edges
between them indicate the
temporal succession of the wor
Diseases such as cancer have clear genomic and proteomic signatures, while psychiatric conditions are more elusive, and may be mostly determined by functional disruptions (problems with our human “software” versus our “hardware”). We set out to show how psychiatry can benefit from computational insights.

So what did we do, and what did we find?

Psychologists at Federal University interviewed hospital patients using standard diagnostic methods, according to the Diagnostic and Statistical Manual of Mental Disorders requirements. The IBM team wanted the text. And after the interviews were manually translated into English, we analytically confirmed – through graphs – the qualitative features of mania and schizophrenia.

Manic graphs are more verbose and contain more loops (when the patient’s train of thought continually return to the same concept) than a normal graph. Schizophrenic graphs are less verbose, but more tangential (when a patient’s focus on one concept consistently changes to many other concepts) than normal.

Traditional interviews consider a handful of scales that quantify the severity of symptoms, with final diagnosis resting with the judgment of the psychiatrist. This method is about 62 percent accurate. Taking only patterns of words – how many words were spoken; how quickly they were spoken; how topical they were – our study’s diagnosis was 93 percent accurate.

Graphs between schizophrenic, normal and manic
Speech graph analysis in schizophrenia, mania and control reports. A) Subjects were asked to report a recent dream. Each report was transcribed and parsed into canonical grammatical elements (words translated from Portuguese, elements separated by slashes). Parts related to dreaming (blue) were sorted from parts related to waking (red), which were considered deviations from the anchor topic. B) Speech graph from the example shown in A), with edges sequentially numbered. The node ‘‘I’’ appears 3 times in the dream sub-graph (‘‘I walked’’, ‘‘I found’’, ‘‘I hugged’’), and then once in the waking sub-graph (‘‘I woke up’’). C) Speech graph examples representative of the schizophrenics (subject MG), manics (subject AB) and controls (subject OR). Graphs plotted using global energy minimum (GEM). The complete database is available as Supporting Information. doi:10.1371/journal.pone.0034928.g001
The difference is purely due to psychiatrists’ use of other factors to make a diagnosis.

Psychosis is part of the spectrum of thought disorders, and the most conspicuous symptoms are expressed in language. Today, the main tool for diagnosis is the personal interview, and a doctor’s assessment of abnormal thought processes reflected in speech.

Words are the most-prominent variables when talking about manic and schizophrenic conditions. We want to establish variables and boundaries – such as the number of words to indicate a condition – that could be put into a technology that will provide clinicians, as well as researchers, with a more quantitative look at their data so that their diagnosis and treatment decisions, which ultimately rest with them, can be better informed.

We are also engaged in extending these initial results to larger cohorts, as well as other modalities of thought and emotional alterations, such as autism and Asperger’s. Preliminary indications show that semantic measures of similarity between words (as opposed to the speech structure revealed by graphs) can be used to help diagnose these other psychiatric conditions that affect emotional processing.

Read the complete report, here: Speech Graphs Provide a Quantitative Measure of Thought Disorder in Psychosis.


  1. Wow - Fascinating! Very cool research - would love to see this in a Documentary!

  2. I would like to know the problems associated with the word 'schizophrenia'. Mainly negative conotations that come with it(schizo). Can someone direct me to a useful website??

  3. Also the word 'patient' the verb. I would like to know the negative conotations that come with it. I am researching stuff on medical and social model languages.