Security in Cloud Computing
Chapter 4
Experimental Results
4.1 Introduction
Proposed Algorithm uses the concepts of semantics, linguistics and syntax of the sentence to judge whether a sentence should be judged humorous or not. The algorithm is designed to work specifically for one liners on twitter to check whether they are humorous or not.
The Proposed Algorithm uses a lot of datasets like antonym pairs, funny emojis, internet slangs etc. to work so, the results are highly dependent on the datasets and therefore high quality of the datasets will lead to more precise results.
The proposed algorithm uses the input datasets as shown in section 3.2. For example dataset containing a list of emojis like :D :) :P o.O ;) >:O ^_^ 8-) 8| :v :3
etc. a list of funny English words like dumb, noob, doohickey, eschew, fiddledeedee, finagle, flanker, floozy, fungible, Girdle, gobsmacked, a big list of positive words, negative words and antonym pairs etc.
Figure 4.1 contains typical funny words in English like panache, nitwick, noob etc. So, if the dataset of funny words covers most of the funny words in English then the probability of results being accurate increases also with it. Same goes for other datasets also if their accuracy increases hence the probability of getting a better result will be there.
This is true because the algorithm will search for key words extracted by the regular expression in the datasets available. For example, let’s say if the keyword extracted by RE is not found in the datasets than chances of the sentence being humorous decreases. So, if the dataset is small and inaccurate it will reduce the accuracy of algorithm.
Proposed algorithm can be compared to the word indexing power (discussed in Chapter 2). This method just scans for all the words in the sentence and calculate the average of the humor factor of each word.
= (Sum of all word Indexes / Total number of Words)
If is greater than a particular humor factor than the sentence is judged humorous. is variable and can be changed according to context.
|
|
Figure 4.1 Dataset for funny words in English
4.2 Results and Comparisons
The following are some examples of test sentences to be interrogated for being humorous:
Sentence 1 You think I was studying lol
Sentence 2 I used to love my mind now I hate it the most
Sentence 3 are you nerd?
Sentence 4 Your friend has 10 hands he is always studying
Sentence 5 go to hell man
Figure 4.2 Results of Sentences 1-5
Figure 4.3 Bar Chart describing probability of sentence being humorous
As shown in Figure 4.3, it describes the probability of a sentence being humorous.
Sentence 1 is clearly humorous with an internet slang like lol. So, there is a high probability i.e. 1 of it being humorous.
Sentence 2 is also humorous with a strong description of variations of semantics.
Sentence 3 has a good probability of being humorous but again it can be offensive too. So, probability 0.75.
Sentence 4 has a sarcastic tone in it which increases the chance of it being humorous. So, probability is 0.75.
Sentence 5 doesn’t seem to have humor content. It is offensive only.
4.2.1 Comparison with Word Indexing power.
Word indexing (), as discussed in chapter2, is the average of humor content, related to all the words in a given sentence.
= (Sum of all word Indexes / Total number of Words)
If is greater than a particular humor factor than the sentence is judged humorous.
Word Indexing will work for all the sentence except sentence 4 which has a sarcastic tone while the remaining sentences has some sort of funny words in them. But proposed technique will be able to work on Sentence 4 because the technique
46 is able to relate the subject and object to a particular number i.e. 10 in the sentence and in turn able to understand a weirdness in sentence thus judging it humorous.
4.2.2 Different Set of Sentences;
Sentence 1 MOOOOOOOOONNNNNNNNEEEEEEYYYY!!!!!!
Sentence 2 why was six scared of seven because seven ate nine :)
Sentence 3 random sample
Sentence 4 Bazinga
Running the algorithm on these sentences again.
Figure 4.5 Results of Sentences 1-4
Figure 4.4 Bar Chart describing probability of sentence being humorous
Sentence 1 has a probability (0.25) of being humorous as it depends on the reference in which it is used.
Sentence 2 is humorous with a strong description of variations words and
semantics. The probability of the sentence being humorous is 1.
Sentence 3 doesn’t seem to have any sort of humor i.e. 0 probability. Since random sample cannot be humorous.
Sentence 4 is a common internet slang from famous TV series Big Bang Theory. So, it has a probability (0.5) of being humorous.
4.2.3 Comparison with Word Indexing power.
Word Indexing power will handle all the cases except Sentence 2 where the sentence sounds funny but word indexing power would not be able to judge because no funny words will be encountered in the sentence 2. But the proposed approach will work for sentence 2 as again it is able to compare subject and object with a number.
Word index power only look for words in the sentence it would not account for the sentence structure that is why it will lag behind the proposed technique which checks for semantics, sentence structure, funny words etc. to decide whether a sentence is humorous or not.
4.2.4 Different Set of Sentences
Sentence 1: Yesterday, I fell down from a 10 meter ladder. Thank God I was on the third step.
Sentence 2: Are you a man or a horse?
Sentence 3: Our conscience is clear- we don’t use it.
Sentence 4: It is the last example.
Figure 4.6 Sentence 1 to 4
Figure 4.7 Bar Chart descripting probability of sentence being humorous
Sentence 1 has a very high probability of being humorous. It is also very funny to hear in linguistics terms. The technique is able to find this funny
because of the organization of words at a particular place.
Sentence 2 is humorous but can be a bit offensive too. Because of the comparison with a horse. So, probability is 0.75 of it being humorous.
Sentence 3 doesn’t seem to have any sort of humor. But can be humorous the technique is not able to find any sort of semantic connections so as to judge it humorous. So, the sentence having a clear conscious is transparent and cannot be humorous. So, technique failed here.
Sentence 4 doesn’t seem to have any sort of humor.
4.2.5 Comparison with Word Indexing power.
The proposed technique has better results compared to word indexing power method.
Word Indexing power will not be able to handle Sentence 1 and Sentence 2 where the sentence sounds funny. For example, in sentence 2 only man and horse are there as subjects both are not funny words. So, word indexing power will yield incorrect results for these sentences. But proposed technique will be able to relate the subject and object of the sentence and see they both are compared to each other.
Word index power will work better in case of Sentence 3 compared to the proposed algorithm. Because proposed algorithm is not able to handle the organization of this sentence and fails to understand the semantics of the sentence.
4.2.6 Analysis of Proposed Technique and Word Indexing Power
Figure 4.8 Bar Chart Displaying Results running on 1000 sentences
Figure 4.8 illustrates the comparison of proposed algorithm and word indexing power method.
1. First set contains 1000 random samples, out of which 128 are humorous sentences,
The proposed technique judged 202 sentences as humorous while word indexing power judged 323 sentences as humorous.
2. In second set, 1000 humorous sentences were subjected to both methods.
The proposed technique judged 700+ sentences humorous while word indexing power only judged 231 sentences as humorous (because of lack of semantic approach). A much better result is gained over word indexing power.
3. Finally, offensive sentences were subjected to both methods. word indexing
power judged much more humorous sentences because it just simply looks for the funny side of a particular word in the sentence. Because a lot of offensive words have high humor factor.
As it is clear in the second point, for humorous sentences, the proposed method generally has better results compared to word indexing power because the latter only scans for the critical humorous words and compares it with a factor
But the proposed technique incorporates the method of word indexing by scanning critical humorous words and also trying to understand the semantics and meaning of the sentence and based on the concepts of linguistics.
For example, a humorous sentence - are you a man or a horse?
are -> 0.21 humor factor
you -> 0.18 humor factor
a -> 0.09 humor factor
man -> 0.32 humor factor
or-> 0.12 humor factor
horse> 0.41 humor factor
= (0.21+0.18+0.09+0.32+0.12+0.09+0.41)/7
= 0.202
is significantly less than the factor which is above 0.4. So, the sentence is judged not humorous.
So that’s why a lot humorous sentences without any humorous words in them will fail to qualify as humorous sentences based on word index power method but proposed technique will be able to incorporate them.
4.2.7 Time Complexity Analysis of Proposed Technique and Word Indexing Power
Time Complexity of Word Indexing power is O(nm).
Where n is the number of words in the sentence and m is the size of the dataset of humorous words.
Time Complexity of Proposed algorithm is O(n2m).
Where n is the number of words in the sentence and m is the size of the largest dataset which is antonym pair. The complexity can be reduced using segmentation if the dataset is sorted by 0(m*nlog n) but since we aqre considering the worst case we assume that the dataset is not alphabetically sorted.
Proposed algorithm will take more time by a factor of n which is the length of the sentence. And generally. in twitter one liners the size of sentence is small (less than 12 word a sentence). So, the time factor would not be that significant.
On the other hand, the accuracy of the proposed algorithm is much more than the word indexing power method as shown in section 4.2.6 part 2 by a factor of 70% on 1000 humorous sentences.
Chapter 5
Conclusion and Future works
5.1 Conclusion
Proposed algorithm tries to understand the meaning of a sentence using concepts of semantics, linguistics and syntax of the sentence. Also, the Proposed Algorithm uses a lot of datasets like antonym pairs, funny emojis, internet slangs etc. to work so, the results are highly dependent on the datasets used and therefore high quality of the datasets will lead to more accurate results.
Proposed algorithm can be compared to the word indexing power as shown in Chapter 4. The results generated by proposed algorithm for 1000 random sentences generate an accuracy of 47.2 % while that for word indexing power accuracy is 12.6%. So, the accuracy of proposed technique is way better than the word indexing power for random sentences.
Now when 1000 humorous sentences were subjected to both the techniques, proposed technique yielded an accuracy of 70.6% while word index power yielded
an accuracy of 23.2%. Again, proposed technique yielded better results compared to word index power.
Now when 1000 offensive sentences were subjected to both of the techniques accuracy of Word index power is 42% and that of proposed technique is 37%. Slightly better than the proposed technique. But again by the basic definition of our humor, we neglected offensive language so as to restrain ourselves from ambiguous results. That is why proposed technique lagged marginally behind the word indexing power method.
From the conclusion drawn above it is quite clear that the proposed technique is a better alternative compared to word indexing power though it is slightly more complex in terms of run time complexity compared to word indexing power. But it yields far better results.
5.2 Future Works
Proposed technique still requires a lot of improvement. Because detection of humor is not a subjective thing. So, a lot of approaches can be used and inclusion and detection of Sarcastic tone can also be used because Sarcasm can lead to humor.