IT110 project

profilewally3237
IT110ProjectPresentation-Task2CameraRating.pptx

Jack Bauer Task 2 -

Camera Recommendation

By Angela Vo

Alexandre Cortez

Nhat Le

Wen Zhong Wu

Objective

Jack Bauer is a field agent and asks for your decision support on cameras (Canon PowerShot SD500, Canon S100, Nikon Coolpix 4300 or Canon G3).

‹#›

Introduction

Sentiment Analysis

Sentiment is an attitude, thought, or judgment prompted by feeling.

Sentiment analysis is also known as opinion mining, studies people’s sentiments towards certain entities.

Sentiment analysis is one of the major tasks of NLP (Natural Language Processing).

Internet is a resourceful place with respect to sentiment information.

What and why useful?

Help us save time extracting big data and creating automation rating application.

Example:

Determine user ratings of products

Help develop recommender systems

Determine whether product reviews are fake

Disadvantage

Reviewer’s opinions might be subjective and make it hard to analyze.

Instead of sharing topic-related opinions, online spammers post spam

or irrelevant topic to product review.

Those things can make Sentiment Analysis inaccurate.

‹#›

Sentiment analysis method

A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level

Review-level was chosen to examine the results for this project.

A number of online reviews about product was labeled as positive or negative on a scale of 5.

Using those reviews to create a training data to test which camera is best from given test data.

‹#›

Summary

How do we approach to get the data?

The data were manually collected on amazon.com and equivalent websites.

The requirements for our team is to collect 50 positive reviews and 50 negative reviews of each camera.

How do we combine the data?

With the data collected, we combine all positive reviews and negative reviews to be the training data.

Totaling 200 positive reviews and 200 negative reviews.

Close to 90% training and 10% testing data.

‹#›

Model

Process Documents Module - Process the documents

SVM Module (Support Vector Machine) - Supervise learning (Classified the two type of reviews positive and negative)

Apply Model Module - Apply the module to the testing data.

Process Documents Module - Process the documents

‹#›

Model Cont...

Tokenize Module - Remove anything is that not recognized as a character.

Transform Cases Module - Transform all strings/word to lowercase to reduce duplicate data.

Filter Stopwords Module - Filter out the Stopword to resize the size of the dictionary

Stem Module - Identify the suffixes of words (suffix vs suffixes)

‹#›

Canon Powershot SD500

‹#›

Canon Powershot SD500 Results

37 reviews were tested

The results were 24 positive reviews & 13 negative reviews

65% positive reviews & 35% negative reviews

Average confidence for positive reviews: 0.557

Average confidence for negative reviews: 0.443

Accuracy: 26/37 = 70.3% Precision = 91.7% Recall = 71%

‹#›

Classified Positive Classified Negative
Actual Positive TP = 22 FN = 9
Actual Negative FP = 2 TN = 4

Canon S100

‹#›

Canon S100 Results

There are 51 reviews for this Camera that are tested.

The results are: 33 positive and 18 negative which is 65% for positive reviews

Average confidence for positive is:0.547

Average confidence for negative is: 0.453

Accuracy = Number of correct classification/ Total number of test cases

Accuracy = 40/51 = 0.784. (78.4%)

Precision = TP/(TP+FP) = 94.1% Recall = TP/(TP+FN) = 76.2%

‹#›

Classified Positive Classified Negative
Actual Positive TP = 32 FN = 10
Actual Negative FP = 2 TN = 8

Nikon Coolpix 4300

‹#›

Nikon Coolpix 4300

There are a total of 34 reviews that was included in the test data.

Out of the 34, 22 are positive and 12 negative reviews.

64.7% of positive review

61.4% average confidence level

64.7% positive review with an average confidence level of 61.4%

Precision: 95.4% Recall: 77.7%

‹#›

Positive TP = 21 FN = 6 27/34 = 79.4%
Negative FP = 1 TN = 6 7/34 = 20.6%
22/34 = 64.7% 12/34 = 35.3%

Canon G3

Specifications

4.0 megapixel sensor creates 2,272 x 1,704 images for prints at 8 x 10 and beyond

4x optical plus 3.6x digital(for 14x total) zoom lens with autofocus

Included 32 MB CompactFlash card holds 54 images at Large/Normal resolution; camera is Microdrive compatible

Connects with Macs and PCs via USB port

Uses proprietary lithium-ion rechargeable battery (included)

‹#›

Canon G3 data

‹#›

Canon G3 Results

There were 45 reviews to be tested by the software.

Rapidminer predicted 7 are negative and 38 positive , based on the test and training data provided.

84.4 % percent was considered to be positive and 15.6% negative.

Average confidence level for positive was 60.4% and for negative 39.6%

‹#›

Canon G3

Truth Overall 39 Positives 6 negatives

Overall accuracy (OA): 91,11%

‹#›

Negative FP=2 TN=4
Positive TP=37 FN=2

Recommendation

Canon Powershot SD500 65% Positive Reviews

55.7% Confidence Level

Canon S100 65% Positive Reviews

54.7% Confidence Level

Nikon Coolpix 4300 65% Positive Reviews

61.4% Confidence Level

Canon G3 84.4% Positive Reviews

60.4% Confidence Level

‹#›