Phd: Research paper-3

profilearea51
OriginalityReport.pdf

7/26/2020 Originality Report

https://ucumberlands.blackboard.com/webapps/mdb-sa-BB5a31b16bb2c48/originalityReport/ultra?attemptId=a78cf268-000c-45d1-b5c1-c493fceaaca4&course_id=_… 1/9

%81

%1

SafeAssign Originality Report Summer 2020 - Data Science & Big Data Analy (ITS-83… • Week 4 Research Paper

%82Total Score: High risk Sunil Kumar Parisa

Submission UUID: 16bd6e1d-6f1e-4e70-65b2-53879007709b

Total Number of Reports

1 Highest Match

82 % Research_Paper_4.docx

Average Match

82 % Submitted on

07/26/20 12:37 PM PDT

Average Word Count

1,483 Highest: Research_Paper_4…

%82Attachment 1

Institutional database (6)

Student paper Student paper Student paper

Student paper Student paper Student paper

Internet (1)

springeropen

Top sources (3)

Excluded sources (0)

View Originality Report - Old Design

Word Count: 1,483 Research_Paper_4.docx

3 2 7

5 1 6

4

3 Student paper 2 Student paper 7 Student paper

8

Modeling Uncertainty in ML and NLP

Sunil Kumar Parisa

University of Cumberland’s

ITS 836 – Data Science and Big Data Analytics

Dr. Kelly Wibbenmeyer

26th July 2020

Abstract

Big data analytics is the capacity to deal with huge volumes of information with shifting arrangements and multifaceted nature from related in- formation, semi-organized information, weblogs, gadget information, and unstructured configurations. Capacity to get bits of knowledge about your items (brands), clients, and workers from online life information and connect with your exchange framework information. Big data analytics is mak- ing life is easier. Anything which you use on a daily basis might be a result of big data analytics. ML techniques are normally not computationally efficient or effective enough to handle big data features as well as vulnerability. NLP procedures can help with making new traceable interfaces and recoup detectability joins by finding semantic closeness among accessible printed ancient rarities.

Keywords: Big data analytics, machine learning techniques, natural language processing

Addressing Uncertainty in ML and NLP

1

2

3

3

7/26/2020 Originality Report

https://ucumberlands.blackboard.com/webapps/mdb-sa-BB5a31b16bb2c48/originalityReport/ultra?attemptId=a78cf268-000c-45d1-b5c1-c493fceaaca4&course_id=_… 2/9

Addressing Uncertainty in ML and NLP

When working with big data analytics, ML is usually used to develop prediction models as well as knowledge gathering to enhance information-dri- ven dynamic. There are several ML procedures recommended for big data assessment; these procedures comprise element learning, deep learning, move to learn, circulated learning, and dynamic learning (Hassani, 2018). Feature learning involves several methods that help a system automatical- ly find the designs needed to collect data and the classification of unprocessed data. ML algorithm's performance is mainly affected by the choice of information depiction. Deep learning algorithms are meant to break down as well as generate critical information from huge sums of information and also information collected from various sources; however, current deep learning models require a high computational cost. Distributed learning can moderate the adaptability application of traditional ML via completing computations on informational indexes adopted among a few workstations to scale up the learning process. Transfer learning involves the use of information gathered from one source to a new source; at the same time, enhancing data movement from one area by moving data from a related space. Dynamic learning involves calculations that use ver- satile information collection forms that subsequently change parameters to gather the most useful information as fast as it could reasonably be ex- pected to accelerate ML activities and avoid naming challenges (Lue, 2019). The vulnerability challenges of ML procedures can be basically ascribed to gathering information with low integrity i.e., dubious and inadequate information as well as information with low value irrelevant to the present issue. Among the ML techniques, active learning, deep learning, as well as fuzzy rationale hypothesis, are extremely recommended to assist in vul- nerability test that reduces the level of risks. Risks can have a big impact on ML so long as poor or uncertain training tests, indistinct classifica- tion limits, and harsh information on the objective information. At times, the information is presented without names, which can be a challenge.

Representing Uncertainty Resulting From Big Data Analytics

Marking big data physically can be a challenge in terms of cost and exhausting in terms of labor. At the same time, using unlabeled data is very difficult as classifying information with hazy rules muddled outcomes. Active learning has addressed this problem by determining a subset of the most significant event for marking. Deep learning is another learning technique that can deal with inadequacy and irregularity challenges in the classification methodology. NLP has a reputable set of techniques and tools which cover both written and spoken languages (Walker, 2015). NLP is also applicable in many areas such as machine translation, information gathering, speech recognition, optical character recognition, spell checking, and many others. Machine Learning (ML), on the other hand, is an approach that could be used in Natural Language Processing and many other fields such as data sciences, decision-making systems, and artificial intelligence. We can easily say that NLP is an interdisciplinary computing field, while ML is a set of strategies and tools to address as well as solve different challenges in a variety of computing fields, including NLP. However, we should not forget that these topics are so getting entangled and intertwined, which makes it difficult to establish a clear line between their defini- tions. Natural language processing provides clarification to the above-mentioned problems using the vocabulary selection method, understanding synonyms, antonyms, homonyms using wordnet, lexicon formation, relationship identification, and Name entity recognition Stanford parser. NLP is an aid to ML and also Deep futuristic learning. Moreover, NLP augmented to ML reduce the search space and make it a guided search.

3

3

3

4

3

As a result, classier don't overfit while training and accuracy are improved. The addition of Semantics to NLP is a major thrust in today’s Learn- ing community.

Enhancing ML and NLP to Handle Big Data

NLP technique is integrated with ML, which helps gadgets to assess, decode, and even create content. NLP and big data handle huge amounts of content information and gradually get an incentive from such a dataset. Some common NLP practices comprise lexical procurement, word sense disambiguation i.e., determining which type of word is used in a sentence in an event a word has different implications and grammatical feature (POS) labeling i.e., hinder mining the capacity of the words through marking classes, for example, action word, thing, and so forth. Several NLP- based techniques have been used to conduct mining, including data gathering, theme demonstration, content outline, classification, grouping, ques- tion feedback, and supposition mining. For instance, financial and fraud detection may include finding proof of wrongdoing in huge datasets (Mora- bito, 2017). NLP technique especially named content extraction and data recovery can help oversee and scan through colossal measures of factual data, for example, criminal names and bank records, to support misrepresentation evaluation.

Impacts of Natural Language Programming in Big Data

Moreover, NLP and big data can be utilized to assess news stories and foresee rises and falls on the composite stock value file. The vulnerability affects NLP in big data in various ways. For instance, the catchphrase search is an exemplary methodology in data mining that is used to deal with a lot of factual information. Watchword search acknowledges as information a rundown of applicable words or expressions and searches the ideal arrangement of information for events of the significant words. The vulnerability can affect catchphrase search, as an archive that contains a watch- word isn't a confirmation of a report's pertinence. For instance, a catchphrase search, for the most part, coordinates accurate strings and overlooks words with a spelling error that may at present be important. Boolean administrators and fluffy pursuit innovations license more prominent flexibil- ity in that they can be utilized to scan for words like the ideal spelling. While big data using AI holds a ton of guarantee, a wide scope of challenges is presented when such methods are exposed to vulnerability. For example, every one of the attributes presents various sources of vulnerability, un- structured, inadequate, or noisy data. Moreover, the vulnerability can be installed in the whole assessment process. For instance, managing inade- quate and loose data is a basic test for most information mining and ML procedures (Hussain, 2016). Also, an ML algorithm may not get the ideal outcome if the preparation information is one-sided in any capacity. Scaling these worries up to the big data level will effectively exacerbate any er- rors or inadequacies of the whole investigation process. Accordingly, a moderating vulnerability in big data analytics must be at the cutting edge of any robotized procedure, as the vulnerability can have a significant influence on the exactness of its outcomes.

Conclusion

Data Analytics and Data Science can solve any and all business problems regardless of whether we have big data or regular data. However, the only difference with the big data analytics will be that we will be typically dealing with large and unstructured data on some sort of distributed computing such as Hadoop, AWS, etc. E-commerce issues with optimization of raw material stocks, rotation of goods, a decrease in warehouse space, and lo- gistics cost can be solved with the help of linear programming and the methods of big data analysis.

3

3

4

3

7/26/2020 Originality Report

https://ucumberlands.blackboard.com/webapps/mdb-sa-BB5a31b16bb2c48/originalityReport/ultra?attemptId=a78cf268-000c-45d1-b5c1-c493fceaaca4&course_id=_… 3/9

Source Matches (24)

Student paper 99% Student paper 97%

References

Hassani, M. (2018). Overview of efficient clustering methods for high-dimensional big data streams. Clustering Methods for Big Data Analytics, 25-42. https://doi.org/10.1007/978-3-319-97864-2_2

Hussain, A., & Roy, A. (2016). The emerging era of big data analytics. Big Data Analytics, 1(1). https://doi.org/10.1186/s41044-016-0004-2

Lue, R. (2019). Data science as a foundation for inclusive learning. 1.2. https://doi.org/10.1162/99608f92.c9267215

Morabito, V. (2015). Big data and analytics innovation practices. Big Data and Analytics, 157-176. https://doi.org/10.1007/978-3-319-10665- 6_8

Walker, R. (2015). Impact of analytics and big data on corporate culture and recruitment. From Big Data to Big Profits, 184-201. https://doi.org/10.1093/acprof:oso/9780199378326.003.0009

3

5 5

6 1

7

1

Student paper

University of Cumberland’s ITS 836 – Data Science and Big Data Analytics

Original source

University of the Cumberland’s ITS- 836 Data Science & Big Data Analytics

2

Student paper

Big data analytics is the capacity to deal with huge volumes of informa- tion with shifting arrangements and multifaceted nature from related in- formation, semi-organized informa- tion, weblogs, gadget information, and unstructured configurations. Ca- pacity to get bits of knowledge about your items (brands), clients, and workers from online life information and connect with your exchange framework information. Big data an- alytics is making life is easier. Any- thing which you use on a daily basis might be a result of big data analytics.

Original source

Big data analytics is the capacity to deal with huge volumes to informa- tion with shifting arrangements and multifaceted nature from organized information, semi-organized infor- mation, weblogs, gadget information and unstructured configuration Ca- pacity to get bits of knowledge about your items (brands), clients and workers from online life information and connect with your exchange framework information Big data an- alytics is making life is easier Any- thing which you use on daily basis might be result of big data analytics

7/26/2020 Originality Report

https://ucumberlands.blackboard.com/webapps/mdb-sa-BB5a31b16bb2c48/originalityReport/ultra?attemptId=a78cf268-000c-45d1-b5c1-c493fceaaca4&course_id=_… 4/9

Student paper 89%

Student paper 74%

Student paper 81%

Student paper 72%

Student paper 69%

3

Student paper

NLP procedures can help with mak- ing new traceable interfaces and re- coup detectability joins by finding se- mantic closeness among accessible printed ancient rarities.

Original source

Additionally, NLP procedures can as- sist with making new traceable inter- faces and recoup detectability joins by finding semantic closeness among available printed ancient rarities

3

Student paper

Big data analytics, machine learning techniques, natural language processing

Original source

Analytics techniques in data mining, deep learning and natural language processing

3

Student paper

Distributed learning can moderate the adaptability application of tradi- tional ML via completing computa- tions on informational indexes adopted among a few workstations to scale up the learning process.

Original source

Distributed learning can be utilized to moderate the adaptability issue of customary ML via completing com- putations on informational indexes appropriated among a few worksta- tions to scale up the learning procedure

3

Student paper

Dynamic learning involves calcula- tions that use versatile information collection forms that subsequently change parameters to gather the most useful information as fast as it could reasonably be expected to ac- celerate ML activities and avoid nam- ing challenges (Lue, 2019). The vul- nerability challenges of ML proce- dures can be basically ascribed to gathering information with low in- tegrity i.e., dubious and inadequate information as well as information with low value irrelevant to the present issue.

Original source

Dynamic learning alludes to calcula- tions that utilize versatile informa- tion collection (i.e., forms that conse- quently alter parameters to gather the most helpful information as fast as could reasonably be expected) so as to quicken ML exercises and over- come naming issues (Dasgupta, 2018) The vulnerability difficulties of ML procedures can be basically as- cribed to gaining from information with low veracity (i.e., dubious and inadequate information) and infor- mation with little value (i.e., irrele- vant to the present issue)

3

Student paper

Risks can have a big impact on ML so long as poor or uncertain training tests, indistinct classification limits, and harsh information on the objec- tive information. At times, the infor- mation is presented without names, which can be a challenge.

Original source

The vulnerability can affect ML as far as inadequate or uncertain training tests, indistinct classification limits, and harsh information on the objec- tive information At times, the infor- mation is spoken to without names, which can turn into a test

7/26/2020 Originality Report

https://ucumberlands.blackboard.com/webapps/mdb-sa-BB5a31b16bb2c48/originalityReport/ultra?attemptId=a78cf268-000c-45d1-b5c1-c493fceaaca4&course_id=_… 5/9

springeropen 71%

Student paper 70%

Student paper 96%4

Student paper

Representing Uncertainty Resulting From Big Data Analytics

Original source

Uncertainty in big data analytics

3

Student paper

Active learning has addressed this problem by determining a subset of the most significant event for mark- ing. Deep learning is another learn- ing technique that can deal with in- adequacy and irregularity challenges in the classification methodology. NLP has a reputable set of tech- niques and tools which cover both written and spoken languages (Walk- er, 2015). NLP is also applicable in many areas such as machine transla- tion, information gathering, speech recognition, optical character recog- nition, spell checking, and many others.

Original source

Active learning has explained this is- sue by choosing a subset of the most significant occasions for mark- ing Profound learning is another learning strategy that can deal with inadequacy and irregularity issues in the classification methodology NLP has an established set of method- ologies, tools, and techniques that cover both written and spoken (not to mention signed) languages (Nas- raoui & N'Cir, 2018) Also, it has large application areas such as machine translation, information extraction, speech recognition, optical character recognition, spell checking, and such

3

Student paper

Machine Learning (ML), on the other hand, is an approach that could be used in Natural Language Processing and many other fields such as data sciences, decision-making systems, and artificial intelligence. We can easily say that NLP is an in- terdisciplinary computing field, while ML is a set of strategies and tools to address as well as solve different challenges in a variety of computing fields, including NLP. However, we should not forget that these topics are so getting entangled and inter- twined, which makes it difficult to es- tablish a clear line between their de- finitions. Natural language process- ing provides clarification to the above-mentioned problems using the vocabulary selection method, understanding synonyms, antonyms, homonyms using wordnet, lexicon formation, relationship identifica- tion, and Name entity recognition Stanford parser.

Original source

Machine Learning (ML), on the other hand, is an approach that could be used in Natural Language Processing and many other fields such as data sciences, decision-making systems, and artificial intelligence We can per- haps say that NLP is an in- terdisciplinary field in computing, while ML is a set of approaches and tools to address and solve different problems in a variety of computing fields, including NLP However, we should not forget that these topics are so getting entangled and inter- twined, which makes it difficult to es- tablish a clear line between their de- finitions Natural language process- ing provides clarification to the above-mentioned problems using the vocabulary selection method, understanding synonyms, antonyms, homonyms using wordnet, lexicon formation, relationship identifica- tion, and Name entity recognition (Stanford parser)

7/26/2020 Originality Report

https://ucumberlands.blackboard.com/webapps/mdb-sa-BB5a31b16bb2c48/originalityReport/ultra?attemptId=a78cf268-000c-45d1-b5c1-c493fceaaca4&course_id=_… 6/9

Student paper 100% Student paper 77%3

Student paper

NLP is an aid to ML and also Deep futuristic learning. Moreover, NLP augmented to ML reduce the search space and make it a guided search. As a result, classier don't overfit while training and accuracy are im- proved. The addition of Semantics to NLP is a major thrust in today’s Learning community.

Original source

NLP is an aid to ML and also Deep futuristic learning Moreover, NLP augmented to ML reduce the search space and make it a guided search As a result, classier don't overfit while training and accuracy are im- proved The addition of Semantics to NLP is a major thrust in today's Learning community

3

Student paper

NLP technique is integrated with ML, which helps gadgets to assess, de- code, and even create content. NLP and big data handle huge amounts of content information and gradually get an incentive from such a dataset. Some common NLP practices com- prise lexical procurement, word sense disambiguation i.e., determin- ing which type of word is used in a sentence in an event a word has dif- ferent implications and grammatical feature (POS) labeling i.e., hinder mining the capacity of the words through marking classes, for exam- ple, action word, thing, and so forth. Several NLP-based techniques have been used to conduct mining, includ- ing data gathering, theme demon- stration, content outline, classifica- tion, grouping, question feedback, and supposition mining.

Original source

NLP is a strategy integrated into ML that empowers gadgets to assess, decipher, and even create content NLP and big data handle large mea- sures of content information and can get an incentive from such a dataset progressively Some stan- dard NLP practices include lexical procurement, word sense disam- biguation (i.e., figuring out which feeling of the word is utilized in a sentence when a name has different implications), and grammatical fea- ture (POS) labeling (i.e., hinder min- ing the capacity of the words through marking classes, for exam- ple, action word, thing, and so forth) A few NLP-based methods have been used to content mining, includ- ing data extraction, theme demon- strating, content outline, classifica- tion, grouping, and question feed- back, as well as supposition mining

7/26/2020 Originality Report

https://ucumberlands.blackboard.com/webapps/mdb-sa-BB5a31b16bb2c48/originalityReport/ultra?attemptId=a78cf268-000c-45d1-b5c1-c493fceaaca4&course_id=_… 7/9

Student paper 80%

springeropen 62%

Student paper 83%3

Student paper

For instance, financial and fraud de- tection may include finding proof of wrongdoing in huge datasets (Mora- bito, 2017). NLP technique especially named content extraction and data recovery can help oversee and scan through colossal measures of factual data, for example, criminal names and bank records, to support mis- representation evaluation.

Original source

For instance, financial and extortion examinations may include finding proof of wrongdoing in massive datasets (Morabito, 2017) NLP strategies (uniquely named sub- stance extraction and data recovery) can help oversee and filter through colossal measures of literary data, for example, criminal names and bank records, to support misrepre- sentation evaluation

4

Student paper

Impacts of Natural Language Pro- gramming in Big Data

Original source

Natural language processing and big data

3

Student paper

Moreover, NLP and big data can be utilized to assess news stories and foresee rises and falls on the com- posite stock value file. The vulnera- bility affects NLP in big data in vari- ous ways. For instance, the catch- phrase search is an exemplary methodology in data mining that is used to deal with a lot of factual in- formation. Watchword search ac- knowledges as information a run- down of applicable words or expres- sions and searches the ideal arrangement of information for events of the significant words.

Original source

Moreover, NLP and big data can be utilized to investigate news stories, and foresee rises and falls on the composite stock value file Vulnera- bility influences NLP in vast informa- tion in various ways For instance, a catchphrase search is an exemplary methodology in content mining that is used to deal with a lot of literary knowledge Watchword search ac- knowledges as information a run- down of applicable words or expres- sions and searches the ideal arrangement of data for events of the significant words

7/26/2020 Originality Report

https://ucumberlands.blackboard.com/webapps/mdb-sa-BB5a31b16bb2c48/originalityReport/ultra?attemptId=a78cf268-000c-45d1-b5c1-c493fceaaca4&course_id=_… 8/9

Student paper 97% Student paper 87%

Student paper 97%

3

Student paper

The vulnerability can affect catch- phrase search, as an archive that contains a watchword isn't a confir- mation of a report's pertinence. For instance, a catchphrase search, for the most part, coordinates accurate strings and overlooks words with a spelling error that may at present be important. Boolean administrators and fluffy pursuit innovations license more prominent flexibility in that they can be utilized to scan for words like the ideal spelling. While big data using AI holds a ton of guar- antee, a wide scope of challenges is presented when such methods are exposed to vulnerability.

Original source

The vulnerability can affect catch- phrase search, as an archive that contains a watchword isn't a confir- mation of a report's pertinence For instance, a catchphrase search, for the most part, coordinates accurate strings and overlooks words with a spelling error that may at present be important Boolean administrators and fluffy pursuit innovations license more prominent flexibility in that they can be utilized to scan for words like the ideal spelling While big data using AI holds a ton of guar- antee, a broad scope of difficulties is presented when such methods are exposed to vulnerability

3

Student paper

For example, every one of the attrib- utes presents various sources of vul- nerability, unstructured, inadequate, or noisy data. Moreover, the vulner- ability can be installed in the whole assessment process. For instance, managing inadequate and loose data is a basic test for most informa- tion mining and ML procedures (Hussain, 2016). Also, an ML algo- rithm may not get the ideal outcome if the preparation information is one-sided in any capacity.

Original source

For example, every one of the V at- tributes present various sources of weakness, for example, unstruc- tured, inadequate, or noisy data Moreover, the vulnerability can be installed in the whole assessment process For instance, managing in- sufficient and lose data is a basic test for most information mining and ML procedures (Ghosh & Liv- ingston, 2019) Also, an ML algorithm may not get the ideal outcome if the preparation information is one-sided in any capacity

3

Student paper

Scaling these worries up to the big data level will effectively exacerbate any errors or inadequacies of the whole investigation process. Accord- ingly, a moderating vulnerability in big data analytics must be at the cut- ting edge of any robotized proce- dure, as the vulnerability can have a significant influence on the exact- ness of its outcomes.

Original source

Scaling these worries up to the high data level will effectively exacerbate any errors or inadequacies of the whole investigation process Accord- ingly, a moderating vulnerability in big data analytics must be at the cut- ting edge of any robotized proce- dure, as vulnerability can have a sig- nificant influence on the exactness of its outcomes

7/26/2020 Originality Report

https://ucumberlands.blackboard.com/webapps/mdb-sa-BB5a31b16bb2c48/originalityReport/ultra?attemptId=a78cf268-000c-45d1-b5c1-c493fceaaca4&course_id=_… 9/9

Student paper 81%

Student paper 100%

Student paper 96%

Student paper 100%

Student paper 90%

Student paper 96%

3

Student paper

Clustering Methods for Big Data An- alytics, 25-42.

Original source

Clustering methods for big data analytics

5

Student paper

Hussain, A., & Roy, A.

Original source

Hussain, A., & Roy, A

5

Student paper

The emerging era of big data analyt- ics. Big Data Analytics, 1(1). https://doi.org/10.1186/s41044-016- 0004-2

Original source

The emerging era of Big Data Analyt- ics Big Data Analytics, 1(1) doi:10.1186/s41044-016-0004-2

6

Student paper

Big data and analytics innovation practices. Big Data and Analytics, 157-176.

Original source

Big Data and Analytics Innovation Practices Big Data and Analytics, 157-176

1

Student paper

https://doi.org/10.1007/978-3-319- 10665-6_8

Original source

doi:10.1007/978-3-319-10665-6_8

7

Student paper

Impact of analytics and big data on corporate culture and recruitment. From Big Data to Big Profits, 184- 201. https://doi.org/10.1093/acprof:oso/9 780199378326.003.0009

Original source

Impact of Analytics and Big Data on Corporate Culture and Recruitment From Big Data to Big Profits, 184-201 doi:10.1093/acprof:oso/9780199378 326.003.0009