IT110 project
Jack Bauer Task 2 -
Camera Recommendation
By Angela Vo
Alexandre Cortez
Nhat Le
Wen Zhong Wu
Objective
Jack Bauer is a field agent and asks for your decision support on cameras (Canon PowerShot SD500, Canon S100, Nikon Coolpix 4300 or Canon G3).
‹#›
Introduction
Sentiment Analysis
Sentiment is an attitude, thought, or judgment prompted by feeling.
Sentiment analysis is also known as opinion mining, studies people’s sentiments towards certain entities.
Sentiment analysis is one of the major tasks of NLP (Natural Language Processing).
Internet is a resourceful place with respect to sentiment information.
What and why useful?
Help us save time extracting big data and creating automation rating application.
Example:
Determine user ratings of products
Help develop recommender systems
Determine whether product reviews are fake
Disadvantage
Reviewer’s opinions might be subjective and make it hard to analyze.
Instead of sharing topic-related opinions, online spammers post spam
or irrelevant topic to product review.
Those things can make Sentiment Analysis inaccurate.
‹#›
Sentiment analysis method
A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level
Review-level was chosen to examine the results for this project.
A number of online reviews about product was labeled as positive or negative on a scale of 5.
Using those reviews to create a training data to test which camera is best from given test data.
‹#›
Summary
How do we approach to get the data?
The data were manually collected on amazon.com and equivalent websites.
The requirements for our team is to collect 50 positive reviews and 50 negative reviews of each camera.
How do we combine the data?
With the data collected, we combine all positive reviews and negative reviews to be the training data.
Totaling 200 positive reviews and 200 negative reviews.
Close to 90% training and 10% testing data.
‹#›
Model
Process Documents Module - Process the documents
SVM Module (Support Vector Machine) - Supervise learning (Classified the two type of reviews positive and negative)
Apply Model Module - Apply the module to the testing data.
Process Documents Module - Process the documents
‹#›
Model Cont...
Tokenize Module - Remove anything is that not recognized as a character.
Transform Cases Module - Transform all strings/word to lowercase to reduce duplicate data.
Filter Stopwords Module - Filter out the Stopword to resize the size of the dictionary
Stem Module - Identify the suffixes of words (suffix vs suffixes)
‹#›
Canon Powershot SD500
‹#›
Canon Powershot SD500 Results
37 reviews were tested
The results were 24 positive reviews & 13 negative reviews
65% positive reviews & 35% negative reviews
Average confidence for positive reviews: 0.557
Average confidence for negative reviews: 0.443
Accuracy: 26/37 = 70.3% Precision = 91.7% Recall = 71%
‹#›
| Classified Positive | Classified Negative | |
| Actual Positive | TP = 22 | FN = 9 |
| Actual Negative | FP = 2 | TN = 4 |
Canon S100
‹#›
Canon S100 Results
There are 51 reviews for this Camera that are tested.
The results are: 33 positive and 18 negative which is 65% for positive reviews
Average confidence for positive is:0.547
Average confidence for negative is: 0.453
Accuracy = Number of correct classification/ Total number of test cases
Accuracy = 40/51 = 0.784. (78.4%)
Precision = TP/(TP+FP) = 94.1% Recall = TP/(TP+FN) = 76.2%
‹#›
| Classified Positive | Classified Negative | |
| Actual Positive | TP = 32 | FN = 10 |
| Actual Negative | FP = 2 | TN = 8 |
Nikon Coolpix 4300
‹#›
Nikon Coolpix 4300
There are a total of 34 reviews that was included in the test data.
Out of the 34, 22 are positive and 12 negative reviews.
64.7% of positive review
61.4% average confidence level
64.7% positive review with an average confidence level of 61.4%
Precision: 95.4% Recall: 77.7%
‹#›
| Positive | TP = 21 | FN = 6 | 27/34 = 79.4% |
| Negative | FP = 1 | TN = 6 | 7/34 = 20.6% |
| 22/34 = 64.7% | 12/34 = 35.3% |
Canon G3
Specifications
4.0 megapixel sensor creates 2,272 x 1,704 images for prints at 8 x 10 and beyond
4x optical plus 3.6x digital(for 14x total) zoom lens with autofocus
Included 32 MB CompactFlash card holds 54 images at Large/Normal resolution; camera is Microdrive compatible
Connects with Macs and PCs via USB port
Uses proprietary lithium-ion rechargeable battery (included)
‹#›
Canon G3 data
‹#›
Canon G3 Results
There were 45 reviews to be tested by the software.
Rapidminer predicted 7 are negative and 38 positive , based on the test and training data provided.
84.4 % percent was considered to be positive and 15.6% negative.
Average confidence level for positive was 60.4% and for negative 39.6%
‹#›
Canon G3
Truth Overall 39 Positives 6 negatives
Overall accuracy (OA): 91,11%
‹#›
| Negative | FP=2 | TN=4 |
| Positive | TP=37 | FN=2 |
Recommendation
Canon Powershot SD500 65% Positive Reviews
55.7% Confidence Level
Canon S100 65% Positive Reviews
54.7% Confidence Level
Nikon Coolpix 4300 65% Positive Reviews
61.4% Confidence Level
Canon G3 84.4% Positive Reviews
60.4% Confidence Level
‹#›
References
https://www.dpreview.com/reviews/canong3
http://link.springer.com/article/10.1186/s40537-015-0015-2
‹#›