AV_5_DR
Discussion 1:
When working with data, we use data manipulation techniques like searching and sorting data for different purposes in data analysis. Regular expressions are a set of characters which are used for searching or manipulating string data, where each character represents a condition for data mining (Chen & Zhiwu, 2020). Regular expressions are important for data analysis as it helps in preparation by collecting relevant data before analysis like data extraction. For example, if we want to search a data set where the last name is XYZ, we can use regular expressions to search and extract the required data (Chen & Zhiwu, 2020).
Wild card (.) and Asterisk (*) are two types of regular expressions which help in constructing a string searching pattern. To better understand these two types of regular expressions with some examples, let us assume we have two characters 'a' and 'b'.
Wild card is used to determine a single character between a set of characters. For example, a regular expression to search for a pattern where 'a' and 'b' are separated by a single character will be 'a.b'. Example data which will fall into this category would be {acb, adb, aeb, afb, agb, .........}.
Asterisk is used to determine more than one character between a set of characters. For example, a regular expression to search for a pattern where 'a' and 'b' are separated by more than one character will be 'a*b'. Example data which will fall into this category would be {accb, acccb, accccb, acccccb, acdefb .....}.
Data analysis includes a process of manipulating the data to prepare it for analysis, which involves these types of regular expressions to create search patterns based on their needs. These techniques help in searching and extracting relevant data (Chen & Zhiwu, 2020).
Based on my own experiences with a project I was a part of, we were working with school data and our data set included information such as student first names, student last names, home address, grades, attendance, etc. One of the reports we were requested is to compare the student performance based on the distance from their homes to school. they were interested in checking if traveling long distances impacts the student's performance in any way. To work on this scenario, we used regular expressions to extract zip codes from students' home addresses to compare distances and developed a report, because of which, we understood the importance of regular expressions at various stages of data analysis (Chen & Zhiwu, 2020).
Reference
Chen, H., & Zhiwu, X. U. (2020). Inclusion algorithms for one-unambiguous regular expressions and their applications. Science of Computer Programming, 102436.
Discussion 2:
The importance of regular expressions in data analytics
The regular expressions are incorporated as powerful method that is meant for the removal of extraneous text from a data set. Thus, it ensures that the subsequent emails are threaded properly and that there is textual within the duplicates as they are accurately identified (Chowdhury, 2017). Regular expression allows the creation of patterns that match and also result into the management of text. The concept is important as it is integrated in the specification of a set of strings that are needed for a specific use. Therefore, this is done through the specification of a finite set of strings through the listing of the elements that are eminent. Thus, regular expression incorporates pattern matching which is vital in working for the validation of text input. This happens because the patterns are flexible and they ensure the provision of a way to make other patterns for the validation of input.
Differences between the types of regular expressions
The matching of digits with a character class is a type of the regular expression whereby all numbers in the lower section are highlighted in either yellow or blue. Additionally, there is the use of square brackets that are not literally matched as they are regarded to be the metacharacters (Kirk, 2019). Thus, this type of regular expression is usually in a position to limit the range of its digits more precisely as compared to the use of character shorthand. On the other hand, the use of a character shorthand incorporates the matching of the digits with the Arabic digits and the digits that are below will be highlighted. The top section is tried with the provisions regular expressions whereby the character shorthand is incorporated.
References
Chowdhury, R. (2017). Regular expressions in big data analytics. 2017 International Conference on Intelligent Computing and Control , 23.
Kirk, A. (2019). Data Visualisation: A Handbook for Data Driven Design. Thousand Oaks, CA: Sage Publications, Ltd.