Test- Download - Java Version - Non-English - Buy! - About

SentiStrengthSentiStrength

SentiStrength estimates the strength of positive and negative sentiment in short texts, even for informal language. It has human-level accuracy for short social web texts in English, except political texts. SentiStrength reports two sentiment strengths:

-1 (not negative) to -5 (extremely negative)

1 (not positive) to 5 (extremely positive)

It can also report binary (positive/negative), trinary (positive/negative/neutral) and single scale (-4 to +4) results. SentiStrength was originally developed for English and optimised for general short social web texts but can be configured for other languages and contexts by changing its input files - some variants are demonstrated below.

Quick Tests (English version):


       

Other languages
: Finnish, German, Dutch Spanish. Russian, Portuguese, French, Arabic, Polish, Persian, Swedish, Greek, Welsh, Italian.

Download SentiStrength

SentiStrength is free for academic research. Please contact the author for the commercial Java version or a commercial license for the online version. The free version runs under Windows only and is provided without liability or guarantees for any uses. Downloading SentiStrength and/or the configuration files signifies acceptance of these conditions. This version does not contain the keyword or domain classification facilities.

Buy SentiStrength

A commercial licence for SentiStrength is available for £1000 - please contact m.thelwall -at- wlv.ac.uk. The Java version of SentiStrength is normally used commercially.

SentiStrength is used by computing, language technology and market research companies in the US, Europe and Australia. Some use the default English version and others have translated it into different languages or adopted it to integrate with their existing language technology systems. Commercial users range from small start-ups to one of the world's top 10 largest corporations.

Java Version

The Java version of SentiStrength is similar to the Windows version in core functions but has additional capabilities - see the SentiStrength Java manual and Mac users' starting instructions (also helps in Linux probably). It can conduct binary (positive/negative), trinary (positive/neutral/negative), single-scale classifications (-4 very negative to very positive +4) in addition to the standard type, and can conduct keyword-oriented and domain-oriented classifications. It also has a special mode for binary and trinary classification on longer texts. It allows wildcards in the idiom list file. To use the Java version for research only (free), email from your academic institution's email address.

For RJB users, here is some sample RJB code from Adam Pantanowitz.

About SentiStrength

SentiStrength is a sentiment analysis (opinion mining) program. It is described and evaluated in the following peer-reviewed academic articles:

Thelwall, M., Buckley, K., Paltoglou, G. Cai, D., & Kappas, A. (2010). Sentiment strength detection in short informal text. Journal of the American Society for Information Science and Technology, 61(12), 2544–2558.

Thelwall, M., Buckley, K., & Paltoglou, G. (2012). Sentiment strength detection for the social Web, Journal of the American Society for Information Science and Technology, 63(1), 163-173.

It has been applied in the following research projects, amongst others.

Thelwall, M., Buckley, K., & Paltoglou, G. (2011). Sentiment in Twitter events. Journal of the American Society for Information Science and Technology, 62(2), 406-418.

Kucuktunc, O., Cambazoglu, B.B., Weber, I., & Ferhatosmanoglu, H. (2012).
A large-scale sentiment analysis for Yahoo! Answers, Proceedings of the 5th ACM International Conference on Web Search and Data Mining.[Use in Yahoo!]

Weber, I, Ukkonen, A., & Gionis, A. (2012). Answers, not links: extracting tips from yahoo! answers to address how-to web queries, Proceedings of the fifth ACM international conference on Web search and data mining (WSDM '12). [Use in Yahoo!]

Pfitzner, René, Garas, Antonios, Schweitzer, Frank (2012, to appear). Emotional divergence influences information spreading in Twitter, ICWSM-12.

Classifying texts with SentiStrength

To get SentiStrength to classify one or more texts, put the texts into a plain text file with one text per line. Select Analyse All Texts in File from the Sentiment Strength Analysis menu and select the text file. The output will be a copy of the file with positive and negative classifications added at the end of each line, preceded by tabs. Individual texts can also be classified by selecting Analyse One Text from the Sentiment Strength Analysis menu.

Optimising SentiStrength term weights

The term positive and negative weights can be found in the EmotionLookupTable.txt file in the SentStrength_Data folder. These can be manually adjusted by editing the file. Alternatively, they can be automatically fine-tuned with a classified text collection. To fine tune EmotionLookupTable.txt values used by SentiStrength, first create a collection of texts that have been classified by humans with positive (1-5) and negative (1-5) sentiment strengths. Put these into a plain text file in which each line has the format: text – tab – negative – tab – positive. The set should be at least 500 texts. Select Optimise the emotion dictionary weights from the Sentiment Strength Analysis menu and SentiStrength will create a new term strength list that is optimised for the sentiment in the new texts. To use the new strengths, save a copy of the original strength list and then replace it with the new list.

Assessing the accuracy of SentiStrength

To assess the accuracy of SentiStrength on a set of texts, a sample must first be classified and formatted as above. The human classifications can then be compared with the SentiStrength classifications on the same sample.
Alternatively, if one data set is available to optimise the word strength list and the same set is to be used for validation then the 10-fold cross-validation procedure can be used. This uses 90% of the data to train the term weights and the remaining 10% to assess the accuracy of the adjusted weights. This is repeated 10 times with a different 10% left out and the total results are reported. To run a 10-fold cross-validation, create the classified text as above and select Run a 10-fold cross-validation to assess the above algorithm from the Sentiment Strength Analysis menu.

Support files

The various files with SentiStrength contain information used in the algorithm and may be customised.

Language customisation

SentiStrength can be adjusted for other languages by translating the term list EmotionLookupTable.txt and adding any other sentiment-bearing words that have been omitted. A training corpus in the new language is recommended to help adjust the term weight strengths (see Optimising SentiStrength term weights).You will need the Java version or Windows version 2.2 of SentiStrength to cope with accented characters or characters not found in English as well as some additional linguistic features.

The following files will also need to be translated or replaced with a local equivalent (see the extra instructions):

You will also need to register a list of non-English common multiple letters (e.g., ii is common in some languages but not English). For the Java version please see the manual for this option. For the Windows version, please check the options menu for this customisation. Spell-checking can also be completely disabled in both versions, if needed.

Negating words occurring after sentiment words (e.g., "I am happy not" is OK in German but not English) can be customised in the Java version of SentiStrength but not the Windows version, sorry.

SentiStrength versions for other languages

Would you like to help? If you are a linguist with knowledge of any of these languages then you could help by:

Please email m dot thelwall at wlv.ac.uk if you would like to help. This makes a good student project.

Classifiers with some testing

Created by Tuomo Kakkonen, School of Computing, University of Eastern Finland - please contact Tuomo for sentiment analysis in Finnish.
Created by Hannes Pirker, Interaction Technologies Group at the Austrian Research Institute for Artificial Intelligence (OFAI). The German version can be obtained via this download form

Completely untested classifiers [just for fun: linguists please contact us if you would like to help improve them - this makes a good student project!]

Domain customisation

SentiStrength can be adjusted for other domains (e.g., Twitter, product reviews) by adding new relevant words and sentiment strengths to the term list EmotionLookupTable.txt and adjusting any relevant existing term strengths. The other files can also be adjusted, as for language customisation. For example, the file EmotionLookupTableGeneral.txt in the download zipfile contains a slightly adjusted set of term weights to cope with more impersonal communication than MySpace. In this alternative file, the word "love" has a higher strength because it is less likely to be used in formulaic message endings, such as "love from" or "love u" or "love x".

Data mining

The data mining menu and ARFF menu items are not part of the main SentiStrength functionality nor documented. Please ignore them unless they make sense to you.

Other

For further issues, please see the Frequently Asked Questions