Sentiments Expressed In Hotel Reviews

The main idea behind gauging sentiment is based on the notion that certain words are known to convey negative sentiments, while others are positive. This assignment represents a combination of two key tasks in data mining: sentiment analysis and text mining. The text mining is the first step, the output of which is used in sentiment analysis.

Given a dataset of reviews of three hotels, identify which one is most positively reviewed to help managers prioritize renovation plans and upcoming marketing campaigns. Refer to the hotel reviews datasets included in this topic, hotel1.csv, hotel2.csv, and hotel3.csv. The algorithm for sentiment analysis used in this model consists of a few steps:

Acquire a set of text-based data, known to contain expressions of opinions about a common topic.
Parse the text to extract a list of all the sentences.
Traverse the sentences and search for words associated with a list of words labeled as positive or
Calculate the ratio of positive to negative words.
Use the positive/negative ratio to quantify the sentiment expressed in the entire dataset.

Complete the following specific steps in R:

Install and load the syuzhet
Load the three hotel reviews datasets into data frames.
Explore and clean the data.
Convert each set of reviews into sets of sentences using the get_sentences()
Verify the output of the last step.
Build the sentiment analysis model.
1. Extract sentiments for each hotel using the get_sentiment()
2. Examine the first 10 values in each of the resulting numeric vectors for each hotel. What are the most positive and the most negative sentences for each hotel (among the first 10 sentences)? Explain. Since the results of this analysis are actionable items, the model calculates the ratio of positive to negative. There are other sentiment analysis data, in which neutral sentiments are valuable, like those expressed towards an artist or a politician. Does your model calculate neutral sentiments as well? If yes, how are you processing these results? If not, why not?
3. Calculate measures of central tendencies for each hotel’s reviews, then summarize your findings and their meaning.
4. Visualize the sentiments using the plot()
5. Plot the trendlines of the sentiments for each hotel using the simple_plot() function and examine the resulting normalized normative time curves.
6. Use the zoo library and the rollmean() function to compute the moving averages of sentiments for the three hotels.
7. Rescale the curves by using the x component of the (x,y,z) vector with values (0,1) returned by the rescale_x_2()
8. Plot the rescales curves.
Interpret the results.
1. Compare the reviews by focusing on the shape of the vectors that represent the reviews. Use the method of cosine similarity to compare the vectors, more specifically the discrete cosine transform (DCT).
2. Use the get_dct_transform() function, which produces smoothed results on a scale of 0 to 100.
3. Plot the DCT smoothing and time normalization for each hotel using the plot()
4. Verify the length of each vector to confirm that it is 100 using the length()
5. Plot all three curves in one graph for easier visual comparison of their DCT smoothing and time normalization.
6. Calculate the correlation of each pair of vectors using the cor()
7. Discuss the significance of these results to managers of the hotels reviewed.
Ethical practices:
1. Reflect on the possible abuses that might occur during the analysis, interpretation, and use of data and results.
2. Substantiate your reflection with concrete examples from your analysis and interpretation in a “what if” scenario.

Submit all the above as one, comprehensive technical report as a R Markdown document, including all computational steps, their results, visualizations, explanations, and analysis.