Test- Download - Java Version - Non-English - Buy! - About

SentiStrengthSentiStrength

Automatic sentiment analysis of up to 16,000 social web texts per second with up to human level accuracy and 14 languages available - others easily added.

SentiStrength estimates the strength of positive and negative sentiment in short texts, even for informal language. It has human-level accuracy for short social web texts in English, except political texts. SentiStrength reports two sentiment strengths:

-1 (not negative) to -5 (extremely negative)

1 (not positive) to 5 (extremely positive)

It can also report binary (positive/negative), trinary (positive/negative/neutral) and single scale (-4 to +4) results. SentiStrength was originally developed for English and optimised for general short social web texts but can be configured for other languages and contexts by changing its input files - some variants are demonstrated below.

Quick Tests (English version):


Output: Dual, binary, trinary, scale

       

Other languages
: Finnish, German, Dutch Spanish. Russian, Portuguese, French, Arabic, Polish, Persian, Swedish, Greek, Welsh, Italian, Turkish.

Download SentiStrength

SentiStrength is free for academic research and is certified safe by Softpedia. Please contact the author for the commercial Java version or a commercial license for the online version. The free version runs under Windows only and is provided without liability or guarantees for any uses. Downloading SentiStrength and/or the configuration files signifies acceptance of these conditions. This version does not contain the keyword or domain classification facilities.

Buy SentiStrength

A commercial licence for SentiStrength is available for £1000 - please contact m.thelwall -at- wlv.ac.uk. The Java version of SentiStrength is normally used commercially.

SentiStrength is used by computing, language technology and market research companies in the US, Europe and Australia. Some use the default English version and others have translated it into different languages or adopted it to integrate with their existing language technology systems. Commercial users range from small start-ups to one of the world's top 10 largest corporations.

Java Version

The Java version of SentiStrength is similar to the Windows version in core functions but has additional capabilities - see the SentiStrength Java manual and Mac users' starting instructions (also helps in Linux probably). It can conduct binary (positive/negative), trinary (positive/neutral/negative), single-scale classifications (-4 very negative to very positive +4) in addition to the standard type, and can conduct keyword-oriented and domain-oriented classifications. It also has a special mode for binary and trinary classification on longer texts. It allows wildcards in the idiom list file. To use the Java version for research only (free), email from your academic institution's email address. It can process about 16,000 tweets per second.

For RJB users, here is some sample RJB code from Adam Pantanowitz.

For Python users, here is some sample Python code from Alec Larsen, University of the Witwatersrand.

About SentiStrength

SentiStrength is a sentiment analysis (opinion mining) program. It is described and evaluated in the following peer-reviewed academic articles:

It has been applied in the following research projects, amongst others.

Press coverage and initiatives

Classifying texts with SentiStrength

To get SentiStrength to classify one or more texts, put the texts into a plain text file with one text per line. Select Analyse All Texts in File from the Sentiment Strength Analysis menu and select the text file. The output will be a copy of the file with positive and negative classifications added at the end of each line, preceded by tabs. Individual texts can also be classified by selecting Analyse One Text from the Sentiment Strength Analysis menu.

Optimising SentiStrength term weights

The term positive and negative weights can be found in the EmotionLookupTable.txt file in the SentStrength_Data folder. These can be manually adjusted by editing the file. Alternatively, they can be automatically fine-tuned with a classified text collection. To fine tune EmotionLookupTable.txt values used by SentiStrength, first create a collection of texts that have been classified by humans with positive (1-5) and negative (1-5) sentiment strengths. Put these into a plain text file in which each line has the format: negative – tab – positive – tab – text. The set should be at least 500 texts. Select Optimise the emotion dictionary weights from the Sentiment Strength Analysis menu and SentiStrength will create a new term strength list that is optimised for the sentiment in the new texts. To use the new strengths, save a copy of the original strength list and then replace it with the new list.

Assessing the accuracy of SentiStrength

To assess the accuracy of SentiStrength on a set of texts, a sample must first be classified and formatted as above. The human classifications can then be compared with the SentiStrength classifications on the same sample.
Alternatively, if one data set is available to optimise the word strength list and the same set is to be used for validation then the 10-fold cross-validation procedure can be used. This uses 90% of the data to train the term weights and the remaining 10% to assess the accuracy of the adjusted weights. This is repeated 10 times with a different 10% left out and the total results are reported. To run a 10-fold cross-validation, create the classified text as above and select Run a 10-fold cross-validation to assess the above algorithm from the Sentiment Strength Analysis menu.

Extra resources:

Support files

The various files with SentiStrength contain information used in the algorithm and may be customised.

Language customisation

SentiStrength can be adjusted for other languages by translating the term list EmotionLookupTable.txt and adding any other sentiment-bearing words that have been omitted. Note that the sentiment scores for terms should be in the range 2 to 5 (positive) or -2 to -5 (negative). A score of +1 or -1 means neutral and neutral terms are ignored. A training corpus in the new language is recommended to help adjust the term weight strengths (see Optimising SentiStrength term weights).You will need the Java version or Windows version 2.2 of SentiStrength to cope with accented characters or characters not found in English as well as some additional linguistic features.

The following files will also need to be translated or replaced with a local equivalent (see the extra instructions):

You will also need to register a list of non-English common multiple letters (e.g., ii is common in some languages but not English). For the Java version please see the manual for this option. For the Windows version, please check the options menu for this customisation. Spell-checking can also be completely disabled in both versions, if needed.

Negating words occurring after sentiment words (e.g., "I am happy not" is OK in German but not English) can be customised in the Java version of SentiStrength but not the Windows version, sorry. The Java version may need the utf8 option to read the input files, if in UTF8 rather than ASCII format (note that utf8 does not always work on ANSI text files so it should not be used as the default).

SentiStrength versions for other languages

Would you like to help? If you are a linguist with knowledge of any of these languages then you could help by:

Please email m dot thelwall at wlv.ac.uk if you would like to help. This makes a good student project.

Classifiers with some testing (6)

Created by Tuomo Kakkonen, School of Computing, University of Eastern Finland - please contact Tuomo for sentiment analysis in Finnish.
Created by Hannes Pirker, Interaction Technologies Group at the Austrian Research Institute for Artificial Intelligence (OFAI), with additions from Elias Kyewski of the University of Duisburg. The German version can be obtained via this download form

Thank you to Eismont Polina, Efanova Iuliia, Konovalova Svetlana, Losev Viktor and Velichko Alena of Saint Petersburg State University of Aerospace Instrumentation, Department of Applied Linguistics for help with the first Russian version. (+ve correl. 0.28-0.47, -ve correl. 0.31-0.46 on tweets - the second number is overfitted due to testing on the evaluation data set, so the real correlation is probably about 0.35 for both). 3000 human-classified Russian tweets.

Thank you to Юлия Павлова, Olessia Koltsova and Sergei Koltsov for the second Russian version. It was developed by the Laboratory for Internet Studies, National Research University Higher School of Economics (NRU HSE), and supported by the Russian Humanitarian Research Foundation and NRU HSE.

There is also a Turkish sentiment strength classifier that is a variant of SentiStrength created by Gural VURAL, METU Computer Eng. Dept. This is available on the same basis as the Java version.

(this version includes only part of Gural VURAL's system so its results are not as good as Gural VURAL's full version.)

Completely untested classifiers (10) [just for fun: please email m dot thelwall at wlv.ac.uk if you would like to help improve them - this makes a good student project for linguists or computer scientists, together with testing the results, and making a small corpus of sentiment-classified texts! Here are the language files for 9 of these languages - please improve them and send back if you like. Please also send us a few positive and negative words from any languages not listed here and we will make a new version for your language!]

Basic classifiers (10) that recognise only a few sentiment words. Please email m dot thelwall at wlv.ac.uk if you would like to help improve them or to send a list of at least 10 common sentiment words for any language. We can't get Hindi, Punjabi and Bengali to work at the moment, sorry. Also, the Chinese simplified and traditional and the Japanese are artificial versions that add spaces between words or phrases into the language.

Domain customisation

SentiStrength can be adjusted for other domains (e.g., Twitter, product reviews) by adding new relevant words and sentiment strengths to the term list EmotionLookupTable.txt and adjusting any relevant existing term strengths. The other files can also be adjusted, as for language customisation. For example, the file EmotionLookupTableGeneral.txt in the download zipfile contains a slightly adjusted set of term weights to cope with more impersonal communication than MySpace. In this alternative file, the word "love" has a higher strength because it is less likely to be used in formulaic message endings, such as "love from" or "love u" or "love x".

Data mining

The data mining menu and ARFF menu items are not part of the main SentiStrength functionality nor documented. Please ignore them unless they make sense to you.

Other

For further issues, please see the Frequently Asked Questions.

SentiStrength was produced as part of the CyberEmotions project, supported by EU FP7.