SentiStrength

SentiStrength is a sentiment analysis (opinion mining) program designed to measure the strength of positive and negative sentiment in short texts, in which the language can be informal. Fed with a set of short texts, it will allocate a negative sentiment strength of 1 to 5 and a positive sentiment strength of 1 to 5 to each one. SentiStrength is configured to analyse English and is optimised for MySpace comments but can be modified for other languages and contexts by changing its configuration files.

*New version online 1 March 2010* - this allows you to specify how many words negative words transfer across (as suggested by Matteo Borsacchi). For example, if you set it to 3 then "not going to be happy" would count "happy" as negative but if you set it to 2 then it would count happy as positive.

Downloading and Starting SentiStrength

SentiStrength is provided free of charge for academic research. Please contact the author for commercial applications. It runs under Windows only and is provided without liability or guarantees for any uses. Downloading SentiStrength and/or the configuration files signifies acceptance of these conditions.

Classifying texts with SentiStrength

To get SentiStrength to classify one or more texts, put the texts into a plain text file with one text per line. Select Analyse All Texts in File from the Sentiment Strength Analysis menu and select the text file. The output will be a copy of the file with positive and negative classifications added at the end of each line, preceded by tabs. Individual texts can also be classified by selecting Analyse One Text from the Sentiment Strength Analysis menu.

Optimising SentiStrength term weights

The term positive and negative weights can be found in the EmotionLookupTable.txt file in the SentStrength_Data folder. These can be manually adjusted by editing the file. Alternatively, they can be automatically fine-tuned with a classified text collection. To fine tune EmotionLookupTable.txt values used by SentiStrength, first create a collection of texts that have been classified by humans with positive (1-5) and negative (1-5) sentiment strengths. Put these into a plain text file in which each line has the format: text – tab – negative – tab – positive. The set should be at least 500 texts. Select Optimise the emotion dictionary weights from the Sentiment Strength Analysis menu and SentiStrength will create a new term strength list that is optimised for the sentiment in the new texts. To use the new strengths, save a copy of the original strength list and then replace it with the new list.

Assessing the accuracy of SentiStrength

To assess the accuracy of SentiStrength on a set of texts, a sample must first be classified and formatted as above. The human classifications can then be compared with the SentiStrength classifications on the same sample.
Alternatively, if one data set is available to optimise the word strength list and the same set is to be used for validation then the 10-fold cross-validation procedure can be used. This uses 90% of the data to train the term weights and the remaining 10% to assess the accuracy of the adjusted weights. This is repeated 10 times with a different 10% left out and the total results are reported. To run a 10-fold cross-validation, create the classified text as above and select Run a 10-fold cross-validation to assess the above algorithm from the Sentiment Strength Analysis menu.

Language customisation

SentiStrength can be adjusted for other languages by translating the term list EmotionLookupTable.txt and adding any other sentiment-bearing words that have been omitted. A training corpus in the new language is recommended to help adjust the term weight strengths (see Optimising SentiStrength term weights).
The following files will also need to be translated or replaced with a local equivalent:

Frequenty Asked Questions