Automatic stress and relaxation strength detection for up to 16,000 social web texts per second for English - other languages easily added.
TensiStrength estimates the strength of stress and relaxation expressed in short texts, even for informal language. It has human-level accuracy for short social web texts in English, except political texts. TensiStrength reports two stress/relaxation strengths:
-1 (no stress) to -5 (very high stress)
1 (no relaxation) to 5 (highly relaxed)
Why does it use two scores? Because stress and relaxation are to some extent oposites but are often discussed in parallel - e.g., "I am stressed and need to lie down".
TensiStrength can also report binary (positive/negative), trinary (positive/negative/neutral) and single scale (-4 to +4) results. TensiStrength was originally developed for English and optimised for general short social web texts, such as tweets, but can be configured for other languages and contexts by changing its input files - some variants are demonstrated below.
TensiStrength is free for academic research. Please contact the author Mike Thelwall for the Java version, whether for academic research or commercial use. It is provided without liability or guarantees for any uses. Downloading TensiStrength and/or the configuration files signifies acceptance of these conditions. To use TensiStrength for research only (free), please email from your academic email address.
The Java version of TensiStrength is similar to the Java version of the sister program SentiStrength - see the SentiStrength Java manual and Mac users' starting instructions (also helps in Linux probably). Most of the information on the SentiStrength website is also true for TensiStrength. TensiStrength should be run with the command line option mood 0.
TensiStrength is described and evaluated in the following peer-reviewed academic article:
Thelwall, M. (2017). TensiStrength: Stress and relaxation magnitude detection for social media texts. Information Processing & Management, 53(1), 106–121. doi:10.1016/j.ipm.2016.06.009
Extra resources available via links in the appendix of the above article (for researchers):
- Coding manual. Extra instructions.
- Development corpus - human-coded tweets
- Evaluation corpus - human-coded tweets from 3 coders
The various files with TensiStrength contain information used in the algorithm and may be customised.
- The SentimentLookUpTable is just a list of stress or relaxation terms, each with the word then a tab, then an integer 1 to 5 or -1 to -5. This can be edited and extended. Note that strengths of +1 and -1 have no effect on the program. There are some in the list, just to indicate that they words have been considered but not used. Each word can end with a wild card * but this can only go at the end.
- The EmoticonLookUpTable is as above but for a list of emoticons.
- EnglishWordList.txt is just a list of English words - it is used for the part of the algorithm that tries to correct words with non-standard spellings.
- NegatingWordList.txt reverses the polarity of subsequent words -e.g., not happy is negative.
- BoosterWordList.txt increases sentiment intensity -e.g., very happy is more positive than happy.
- SlangLookupTable.txt – replaces common slang with equivalent words or expressions
- IdiomLookupTable.txt–overrides the sentiment strength of the individual words in the phrase
TensiStrength can be adjusted for other languages by translating the term list SentimentLookupTable.txt and adding any missing relaxation and stress terms. Note that the sentiment scores for terms should be in the range 2 to 5 (relaxation) or -2 to -5 (stress). A score of +1 or -1 means neutral and neutral terms are ignored. A training corpus in the new language is recommended to help adjust the term weight strengths.
The following files will also need to be translated or replaced with a local equivalent (see the extra instructions):
- EmoticonLookupTable.txt - check the strengths are appropriate and add any common new national variations
- SlangLookupTable.txt – replace with a list of common slang in the new language
- EnglishWordList.txt – replace with a word list of correct spellings in the new language (many such lists are on the web, but this step is optional)
- NegatingWordList.txt – translate/replace with a list of negating words in the new language
- IdiomLookupTable.txt–replace with a list of common idioms in the new language
- BoosterWordList.txt – translate/replace with a list of booster words in the new language – words that emphasise the strength of emotion in any subsequent words
- QuestionWords.txt– translate/replace with a list of words in the new language that reliably indicate that a question is being asked
You will also need to register a list of non-English common multiple letters (e.g., ii is common in some languages but not English). For the Java version please see the manual for this option. For the Windows version, please check the options menu for this customisation. Spell-checking can also be completely disabled in both versions, if needed.
Negating words occurring after sentiment words (e.g., "I am happy not" is OK in German but not English) can be customised in TensiStrength. TensiStrength may need the utf8 option to read the input files, if in UTF8 rather than ASCII format (note that utf8 does not always work on ANSI text files so it should not be used as the default).
Would you like to help? If you are a linguist or non-English native speaker then you might be able to help us to make a version for your language.
Please email m dot thelwall at wlv.ac.uk if you would like to help. This makes a good student project.
TensiStrength can be adjusted for other domains (e.g., Twitter, product reviews) by adding new relevant words and sentiment strengths to the term list SentimentLookupTable.txt and adjusting any relevant existing term strengths.