Data Science
The purpose of the exercise is to practice different machine learning algorithms for text classification as well as the performance evaluation. In addition, you are requried to conduct 10 fold cross validation (https://scikit-learn.org/stable/modules/cross_validation.html) in the training.
The dataset can be download from here: https://github.com/unt-iialab/INFO5731_FALL2020/blob/master/In_class_exercise/exercise09_datacollection.zip. The dataset contains two files train data and test data for sentiment analysis in IMDB review, it has two categories: 1 represents positive and 0 represents negative. You need to split the training data into training and validate data (80% for training and 20% for validation, https://towardsdatascience.com/train-test-split-and-cross-validation-in-python-80b61beca4b6) and perform 10 fold cross validation while training the classifier. The final trained model was final evaluated on the test data.
Algorithms:
(1) MultinominalNB
(2) SVM
(3) KNN
(4) Decision tree
(5) Random Forest
(6) XGBoost
Evaluation measurement:
(1) Accuracy
(2) Recall
(3) Precison
(4) F-1 score
6 years ago
5
- essay
- Human Resource Management discussion
- Sheila was employed as an assistant manager at Kinkos. She was accused of theft and embezzlement, which she admitted to, and was fired. The following day at company headquarters, she signed a contract promising to repay the Company $2000. She made a payme
- how the Systems Model of Organizational Accidents can be used by risk managers to understand and manage incidents and performance
- KELLY JACOBS ONLY
- american history
- can you discussion assignment due 1/23/16 @730 pm eastern 1 page
- Market analysis homework Beef Market: consumers become increasingly aware of the harmful health effects of eating red meath. how this effects demand...
- NRS-433V Week 2 Research Summary and Ethical Considerations
- Goal Setting