Artificial Intelligence In Retail Industry : What Makes A Successful Online Product?

Artificial Intelligence In Retail Industry. With the fast development of the e-commerce platform, online shopping is more and more favored by people nowadays. Unlike offline shopping, customers cannot meet face to face with the product, thus they need to seek an answer in the ratings and reviews of the products,making it more and more important for vendors to look deep into the reviews. By analyzing and sorting out this information through scientific ways, we can extract hidden information from it, and provide precious directions for the design and sales of the products.

First, we dropped invalid data out of the data set.

Second, by analyzing the existing review data, we extract and combine the most informative measure to focus on: Star-Sentiment Measure (SSM). It is formed using RNN’s.  It gives a comprehensive consideration of customer’s satisfaction degrees. We recommend this measure to include both rating and text information.

To make time series-based predictions, we introduce a Reputation Trend Forecast (RTF). With a given review set before a time period of 30 days, the model can forecast whether the reputation of the product will increase or decrease, with an accuracy of over 93% for each kind of product. By applying RTF, you can adjust your marketing strategy if facing an upcoming decrease in reputation.

To take the sales scale into concern, we fit the Success forecast (SF) using different machine learning approaches. We chose the BP neural network for each kind of product as it performed best. Under a definition of success as “reaching a certain market share in a period of time”, we received 77.4%,83.1% 69.7% accuracy for hairdryers, pacifiers, and microwaves. 

User behavior is another topic we researched.

We conducted incitation analysis using a logistic function, resulting in a certain description of user behavior: They are likely to be incited by specific star rating, yet not incited by emotional tendency of other review texts.

We extracted “hot words” for each product, and developed a Significant Index (SI) approach to figure out which feature of the product to enhance and advertise, and which to avoid. Those quality descriptors with higher SI and positive contribution is a good choice for the topic of advertising as well as a critical feature in the design of the product.

First of all, by analyzing the existing review data, we extract the most informative measure to focus on: (See also in the body of the report)

Artificial Intelligence In Retail Industry

Star-Sentiment Measure (SSM)-SSM is formed using the RNN network.  It gives a comprehensive consideration of customer satisfaction degrees. We recommend this measure to include both rating and text information.

The reviews come as a time series of different products. As we want to figure out how reputation be forecasted using former reviews, we introduced a Reputation Trend Forecast (RTF). With a given review set before a time period of 30 days, the model can forecast whether the reputation of the product will increase or decrease, with an accuracy of over 93% for each kind of product. By applying RTF, you can adjust your marketing strategy if facing an upcoming decrease of reputation.

More importantly, we pay close attention to the sales scale of the product, which can bring real profit to Sunshine rather than just fame. Success forecast (SF) is proceeding using different machine learning approaches. We chose the BP neural network for each kind of product as it performed best. Under a definition of success as “reaching a certain market share in a period of time”, we received 77.4%,83.1% 69.7% accuracy for hairdryers, pacifiers and microwaves. Once you find you can succeed, just keep on track; once the model says you are going to fail, have adjustment and make things right. We hope these two forecast models will be helpful for your decision making.

Artificial Intelligence In Retail Industry

User behavior is another topic we researched. Do specific star ratings incite more ratings? Our conclusion is, high ratings reduce negative reviews, and low ratings trigger negative reviews. People always follow the trend, don’t they? Thus, we suggest you participate in the Vine Program, because the vine users tend to rate higher and receive more helpful votes. This will help you in both explicit rating and implicitly incite more positive reviews.

We extracted “hot words” for each product, and developed a Significant Index (SI) approach to figure out which feature of the product to enhance and advertise, and which to avoid. Please pay close attention to the powerful, quiet and hot features of your hairdryer; cute, soft and fit features of the pacifier; broken during delivery and Customer service is an important thing to improve for the microwave, and do not forget to make it easier to use. Those features with higher SI and positive contribution are a good choice for the topic of advertising.

 Background

The online marketplace is popular recently, and Amazon is one of the largest online vendors in the world [1]. The shopping experience it provides differs from the traditional market, with much of the information generated by users through reviews and votes for the reviews. As for users, they can express their opinion through star-rating and context, though there may be an inconsistency between them. They also gain information from the previous reviews. For example, a five-star rating may trigger one’s purchasing, while a one-star may discourage one on the contrary. Further, there might be a chain effect: Suppose a buyer is quite dissatisfied with the product and wants to rate 1-star, however, when seeing a lot of 5-star there, he or she may tend to rate 3-star rather than 1-star. 

Since a company and its customers are exposed to user-generated content, efforts must be made to understand the information in both the star-rating and the context, together with “vine”, “helpful vote” and other features. This can help the company manage its reputation and better fulfill the needs of its customers.

 Restatement of the Problem

As required by the Sunshine Company, we are supposed to use Amazon review data to identify data measures, time-based patterns and combinations of different measures to indicate reputation and potential success of the 3 upcoming products: microwave oven, baby pacifier, and hairdryer.

The problem can be analyzed into six parts:
  • Analyze the existing data and confirm the most important measures to track on.
  • Using these measures to conduct time-based analysis, rebuild the measures as an indicator of the trend of the product’s reputation.
  • Create a grading system (GS) to combine text-based measures and rating-based measures, the GS should reflect future success or failure of the product.
  • Analyze the influence of former reviews on users who currently write a review from the perspective of users’ behavior.
  • Find specific quality descriptors that have correlativity with specific rating levels.
  • Conclude the important design features of the product and make a proposal of the online sales strategy of Sunshine Company.

 General Assumptions and Justifications

  • People have the same “review tendency” for the same kind of product.

-For example, anyone who buys A hairdryer has the same probability to write a review than that of someone who buys B hairdryer. This infer that there is a positive proportion between the number of reviews and the sales volume.

  • People are rational and follow the amazon rule of writing a review.
  • “Success” of a product is defined as top rank of sale volume or sale proportion per unit time.
  • The vast majority of users who write reviews will browse other reviews, and the number of which is about 2 or 3.

-The image shows what users see before writing a review. Furthermore, the default of reviews sequence is “Top Reviews”, which means reviews with more helpful votes get nearer the front. This indicates that if a specific star rating is going to incite some kinds of review, it should be those with highest helpful votes that influence users’ review behavior because very likely the users only see them. We assume that the first 3 reviews contribute most to the user’s reviewing behavior, since the user only sees the first review if he or she wants to write a review because the “write a customer review” will disappear if you scroll to the fourth existing review. In addition, with this assumption, the question d can be simplified to: Do specific stars of the existing 3 most helpful votes incites some type of review?

  • On Amazon’s website, the default ordering of reviews is from high to low by the number of votes voted by helpful votes, according to Amazon rules.

Data Analysis & Data Cleansing

To apply further research, the invalid data should be removed. The distributed data file is a table file with ‘\ t’ as the separator. There are special characters in the file, which can only be opened with UTF-8 encoding or more advanced extended encoding. Based on the preliminary observation of the data, it can be found that there is a certain amount of garbled code and similar web page structure language < br >. It is speculated that the distributed file is the file obtained by crawling Amazon Web pages and sorting out. In order to eliminate the web page structure language and garbled code, we use regular expressions to clean up the title and body of the comment. All the garbled errors have been eliminated.

Secondly, using Excel to check the structure of the table, we found that there is only one structural error in hair_dryer.tsv, so we can modify it manually.

Another important point is the processing of irrelevant data. Because most of the comments are too short, we directly use word2vec to extract word vectors for text clustering. In this way, the number of texts that can be filtered out is not too much, so we also use the manual method to check and delete the filtered data to get the cleaned data set. 

Clarify the Glossaries 
Star Rating
The explanation of rating system from Amazon.com is shown below:
Star LevelGeneral Meaning
1🌟I hate it.
2🌟🌟I do not like it.
3🌟🌟🌟It is just OK.
4🌟🌟🌟🌟I like it.
5🌟🌟🌟🌟🌟I love it.

This indicates that the star level is a direct measurement of customers’ attitude towards the product. As simple as it is, we assume that the star-rating can describe one’s real feeling as shown in the “General Meaning” in table above.

Artificial Intelligence In Retail Industry

We first take a look at the distribution of ratings of 3 products. The graph shows:

  • 5-star is the most common users’ rate (58%, 67%, 41%)
  • Microwave get especially high proportion of 1-star (25%)
  • 2-star and 3-star counts for a relatively small proportion for all products(13%,15%,15%)
Hair dryerPacifierMicrowave
Mean4.124.33.44
Median554
Variance1.6911.1901.645
This may give rise to some questions:
  • Is there anything special in the context of 1-star and 2,3-star reviews?
  • Are there some specific features of the microwave that triggers 1-star rating?

All 3 products’ star rating follows normal distribution under 95% confidence level. This means it is appropriate for us to focus on the mean and variance of star rating.

Vine and Verified Purchase

Amazon Vine stands for trust that users have earned in the Amazon community for writing accurate and insightful reviews. Amazon provides Amazon Vine members with free copies of products that have been submitted to the program by vendors. There might be 2 effects that requires verifying:

  • Vine users may give higher star-rating because they get the product for free and with relatively lower expectation on it
  • Vine users’ reviews tend to have higher quality, measured by “helpful votes” and the “helpful votes/total votes ratio”

We process t-Test for both hypotheses. As for the first one, both hair dryer and microwave passes Levene’s Test for Equality of Variances under 95% confidence when grouped according to vine (Y or N). The T-test also rejects the null hypothesis under 95% confidence that the vine group has the same average star as the vine group. This proves that vine users of microwave and hair dryer tend to give higher star-rating, and their ratings show lower variance, but not for pacifiers. One possible explanation is that the pacifier is only suitable for baby parents, it may be useless for vine users who receive it for free. This needs to be found out in review text analysis.

Artificial Intelligence In Retail Industry
vineNumberMeanStd. Deviation
star_ratingof hair dryerY1794.44.757
N112914.111.306
star_ratingof microwaveY194.37.684
N15893.431.651

As for the second effect, only microwaves can reject the null hypothesis of T-test, indicating that microwave vine users tend to receive more “helpful votes” and “total votes” than not vine users. Also, they receive a higher rate of helpful (RoH), 94.2%, that of non-vine users is 83.0%.

Helpful votesof microwaveY1966.32188.062
N15894.9218.394
Total votesof microwaveY1970.37188.062
N15895.9318.394
Artificial Intelligence In Retail Industry

Most vine reviews are not verified purchases since the user receives the product for free. This should be taken into consideration by seeing vine view as a verified purchase.

Review

Artificial Intelligence In Retail Industry

The top 3 mentioned meaningful quality descriptors for the products are: 

Hair dryer- powerful, hot and quiet

Microwave- broke, easy and Customer service

Pacifier- cute, fit, and soft

Though customer service is not strictly quality descriptors, it is a common complaint from microwave reviews. As there is no quality descriptor rank top 100 of word frequency, we take it as a descriptor that needs further analysis.

Length is an important characteristic of a review. We found linear relationship between length of the review and it’s helpful votes, showing below:

ModelUnstandardized    CoefficientstSig.
BetaStd. Error
(Constant)-1.014.164-6.167.000
length.059.00230.883.000

Under 95% confidence level, the helpful votes have a positive linear relationship with length of the review, in which every 100 more words written give rise to 6 more helpful votes.

RNN based Star-Sentient Measure calculation

We have noticed that the topic and content of the product’s reviews provide an effective reference for buyers, which in turn greatly affects their purchasing decisions. Therefore, the sentiment analysis of a review is needed to conclude the emotion of a review. This will help us in a way of depicting users’ further feelings other than star rating, and also helps the company to objectively recognize the advantages and disadvantages of their products. Considering the application of deep learning algorithms in the field of natural language processing is suitable for our purpose, we decided to use a Long Short-Term Memory (LSTM) network, a special type of RNN in deep learning, to perform sentiment analysis on product reviews. We hope to find the value of emotional tendency with a higher reference value.

Artificial Intelligence In Retail Industry

To do this, we first import the raw data set of the review text into a python list and split the raw text review into separate sentences. Here we use the Tokenize module under Keras and Tensorflow framework to process the text and randomly extract comments to get the training set data and test set data. The following figure shows the network structure of the LSTM network:

Artificial Intelligence In Retail Industry

Then we use the Word2Vec tool to complete the embedding of the word vector. Here we use the GloVe.6B data set as the sentiment analysis lexicon to convert each word in the review into a 100-dimensional tensor, and use this data as the training set to complete for LSTM network training, we only need to retain its emotional propensity value. The following figure is the structure of the classification model:

We calculate the correlation coefficient between sentiment rating and star rating, the result is:

0.976 for hairdryer, 0.851 for microwave, and 0.976 for pacifier. This indicates sentiment rating has a strong positive relationship with star rating. Since star rating is discontinuous, we create the combination to add star rating and sentiment rating to create a Star-Sentient Measure (SSM), as a star-based measure justified by review text. Notice that the SSM ranges from 1 to 6, matching most negative to most positive.

An example of reviews and their sentiment measure of Microwave:

Review bodyStar ratingSentiment ratingSSM
It came slightly damaged but the main problem is that it does not heat at all in any functions.  I have tried to heat a cup of water with Microwave Express, it looks like it is cooking but the water is at room temperature after 30 seconds.  Then I have tried all the functions and none of them work.  The surface light and the vent fan works fine.  I really should try it before I install it.  Now I have all the trouble to remove it and install another one. I ordered it from Amazon and contacted both the seller Home Care Company.  Since it is Saturday I will have to wait up to two working days to get a response from the Home Care Company.  I am not sure what will happen but I do not recommend buying this product or buying it online.10.1631.16
I never used this because my wife insisted we buy a new microwave after I bought this. I did contact GE and they gave me a 20$ credit and said not to worry about  paint peeling off. Tell that to your wife! That was OK because I read some reviews which made it sound a little difficult to use.30.3863.39
Excellent microwave and excellent convection oven combined in one compact unit.50.9155.92
Most informative data measures for Sunshine Company to track

From the analysis above we conclude valuable measures we recommend Sunshine Company to track:

SSM

-SSM of each review gives a comprehensive consideration on the customer’s satisfaction level. We recommend this measure to include both rating and text information.

The number and proportion against the market of reviews within a period of time

-The number of reviews represents the sales scale, while the proportion against the market marks the rank of our product on the market. Further, we can track the change and variation of this measure.

The number of vine users’ review

-Vine users tend to give higher star-rating and receive more helpful votes. With a larger number of vine users’ reviews, our shown star rating will get higher and the review page of our product will become more attractive because it is helpful. To achieve this, we can participate in the Vine Program of Amazon and send free products to vine users.

 Reputation trend forecast based on time series

In this part, we analyze the changes of the reputation in time series, and construct a set of methods and patterns based on the time series to forecast the future reputation.

Firstly, we define reputation as a combination of star-rating based and review-text based measure within a certain time period based on SSM.  We set the time period as 30 days.

Considering that different users have different perceptions of the product, we choose to comprehensively consider the rating and review within a fixed period of time T, and take the average of star rating as the reputation of that period of time.

In order to take the influence of time on the current report into consideration, we construct the following functions to weight the comments at different times, synthesize all previous comments and evaluate the current time reputation:

Weighttimei=  Tnow-Tia+C, a≤0

Weightvotei= KP0ervotei-QK+P0ervotei-Q-1, P0≥0, Q≥0

Reputation=Ti<TnowWeighttimei*Weightvotei*ReputationiTi<TnowWeighttimei*Weightvotei

Considering the influence of events in different times on the current reputation on the timeline, we need to set an adjustable satisfaction: the closer the event occurs, the greater the impact of the weight function. Moreover, considering the helpful votes value of different reviews, we think that the more popular or helpful answer should have a greater contribution to the reputation, so we set the weight function and weighted it in the calculation of the reputation.

Artificial Intelligence In Retail Industry

Through this we generate reputation time series. Use Logistic function to fit the series, that is, use Reputationi, Reputationi-1…Reputation1 to verify Reputationi+1.

y=A-D(1+xCB+D

Use root-mean-square error to calculate error of the fitting:

RMSE=t=1nyt-yt2 n

Minimize error and solve the model, we come to the following conclusion, with both RMSE measurement and accuracy measurement:

HairdryerMicrowavePacifier
RMSE0.1970.2060.168
Accuracy0.9430.9420.978

The model provides extraordinary predictions of the increasing or decreasing of reputation within the next time period compared with the current time period. For example, as for a certain pacifier product, given its previous review, we can predict whether the reputation of the next period is more or less than the current time period, with an accuracy of 97.8%.

Success forecast based on machine learning

Because product ratings and reviews are both important reference indicators, but it is not convenient to directly compare them, we quantify the two types of information, obtain score values and review sentiment values, and use a series of methods to make predictions. Produce the best evaluation model.

The success of goods is defined as : reaching a certain market share in a period of time. Then, according to our assumptions, we can find the “success” and “failure” situations that meet our definition from the existing data, and get the comprehensive situation of scores and comments in a short period of time before the situation occurs, and regard it as a mode of success or failure.

Therefore, we can use these cases to train a case recognition classifier in a machine learning method, so that we can recognize the given pattern- that is, the comprehensive situation of scoring and comment. In specific data generation, we use a triple (x, y, z) to describe the pattern:

X:Time from first review to now

Y:current average rating of the product

Z:current average sentiment rating of the product

Artificial Intelligence In Retail Industry

Later, in order to ensure the number and proportion of success and failure cases, we set the market share benchmark of success cases according to different data sets (there are more packer data, the market share required to meet will be lower, and there are less microwave data, so the market share required will be higher). At the same time, we set different time periods and the time defined in failure cases Length, and abandoned the data in the period of market downturn (the sales volume in the period is lower than a certain value), and successfully generated appropriate data.

We then the Waikato intelligent analysis environment to adopt six common machine learning methods, which are logistic regression algorithm, BP neural network, J48 Decision Tree algorithm, Support Vector Machine, Naive Bayes algorithm and K-Nearest Neighbor methods on the three commodity data sets are shown in the following table:

CorrectionHair DryerPacifierMicrowave
logistic regression75.49%80.84%57.84%
BP neural network77.39%83.07%69.60%
Decision Tree73.59%84.13%62.74%
Support Vector Machine57.70%68.89%55.39%
Naive Bayes61.16%64.08%61.76%
K-Nearest Neighbor75.49%81.99%71.07%

Comprehensively comparing the results in the figure, we believe that the BP neural network has the best model accuracy and can be used as our optimal evaluation model to predict the success of the product.

Incitation analysis based on Logistic Function

According to the last two general assumptions, we can perform Incitation Analysis (IA) on the user’s review behavior, so we can obtain the rating status of the reviews that each user browses when making a review, and statistics the user’s own emotional tendency of reviews.

Considering that the help vote value of the given data does not truly reflect the help vote value when the user reviews, we examine the general growth law of the help vote. The following is an example of the growth curve taken from “like” trend of social media:

It can be found that this curve is highly consistent with the Logistic function. After consulting some relevant papers, it is found that this is indeed the case. Therefore, we consider using a logistic function to simulate the increase in the number of helps votes and maintain the current voting ranking in real time:

Logistict= KP0ertK+P0(ert-1)

Artificial Intelligence In Retail Industry

In order to restore the real situation, we adjust the parameter r for a different value, that is, adjusting the slope of the curve to fit the data better. We then performed statistics according to the above method and discarded some useless data (for example, the number of comments itself is too small, and the current time of comments is not enough to build a ranking, etc.), we have a pair of the form (X, Y):

X=rating1, rating2, rating3

Y=setiment_score

That is to say, the relationship between the top 3 star ratings and the emotional tendency of the current review. We choose certain subset of high scores such as [5,5,5] [5,5,4] [5,4,4], and compare his sentiment score distribution with the original full set distribution (for convenience, the following uses the pacifier dataset result as an example):

It can be found that the pull of high ratings to high ratings is not very obvious, but it can reduce the occurrence of low ratings. In order to verify our conjecture, we calculate the proportion of high scores and low scores in the two sets: of which the high rating accounted for 74.66% in the good rating set, the low rating accounted for 2.93%, and the high rating accounted for 70.5 %, Low evaluation accounted for 6.22%. Moreover, our conclusion is correct!

In addition, in the same way, we have statistics on low-score evaluations ([1,1,1] [1,1,2] [1,2,2]):

Artificial Intelligence In Retail Industry

We find that the impact of low scores on high scores is small, but the appearance of negative reviews has increased significantly. Furthermore, bringing in the verification: the proportion of low ratings in the bad rating collection reached 15.1%, which includes almost all the negative ratings in the full set! To sum up:

I. The appearance of high-rating reviews reduces the number of bad reviews.

II. The appearance of low-rating reviews will trigger the increase of the number of bad reviews significantly.

III. The appearance of sentimental positive comments is not closely related to the reviews written by users.

Do quality descriptors associate with star-rating?

By answering this question, we actually want to analyze: consumers perceive which characteristics of the product will be satisfied with the product? For especially good or bad ratings, is there a high probability that one descriptor exists? Probability is preferred to correlation for solving this.

As mentioned above, the top 3 frequent meaningful quality descriptors for the products are: 

Hair dryer- powerful, hot and quiet

Microwave- broke, easy and Customer service

Pacifier- cute, fit and soft

Does it mean Sunshine company should pay special attention to them? 

Microwave12345Total
broke63.24%19.12%4.41%5.88%7.35%68
easy3.68%5.52%4.91%26.38%59.51%163
customer service73.08%3.85%7.69%11.54%3.85%52
Hairdryer12345Total
powerful2.58%4.18%10.57%19.04%63.64%814
hot10.69%8.18%9.85%19.96%51.32%1553
quiet3.99%3.55%8.58%19.23%64.64%676
Pacifier12345Total
cute1.95%4.28%8.56%13.77%71.44%1845
fit4.73%5.15%8.42%19.67%62.03%1164
soft2.64%3.75%6.49%15.52%71.60%986
Moreover, for each word, the probability of acquiring a 1-star rating is defined as:

Pword,star rating, 1

For example,

Ppowerful,3, 1=10.57%

From these 3 table we can conclude the answer to the question:

Hairdryer:

  • The feature of powerful, hot and quiet wins customers’ favor.
Microwave:
  • As a large commodity, customers are troubled by transport damage, or DoA (Dead on arrival) as it turns to be “broke”.
  • Easy to use usually wins customers’ like.
  • Microwave customer service is a big problem since nearly 3 quarter of whom mention it gave 1-star rating, while there is nearly no one give 5-star because of customer service.
Pacifier:
  • The feature of cute, fit and soft wins customers’ favor.

In addition, though all of the quality descriptors above stand for a certain tendency of satisfying or dissatisfying the customer, we still need to figure out the importance of them. Moreover, in other word, if most reviews including a word get 5-star rating, we must also know if a large number of 5-star ratings contain this word. Furthermore, the results are below:

Hairdryer12345
powerful2.03%5.32%8.61%7.40%7.73%
hot16.09%19.87%15.32%14.79%11.89%
quiet2.62%3.76%5.81%6.20%6.52%
Microwave12345
broke10.70%11.61%2.24%1.33%0.75%
easy1.49%8.04%5.97%14.33%14.54%
customer service9.45%1.79%2.99%2.00%0.30%
Pacifier12345
cute3.02%8.36%11.08%9.35%10.41%
fit4.61%6.35%6.87%8.43%5.70%
soft2.18%3.92%4.49%5.63%5.58%

For each star rating, the probability of having a certain word is defined as:

Pword,star rating, 2

For example,

Ppowerful,3, 2=8.61%

We define the Significance Index (SI) of “word” at “star” rating level as:

SI(word,star rating)=Pword,star rating,1*Pword,star rating, 2*100
ProductWordrepresent ratingSI
HairdryerPowerful54.92
HairdryerHot56.10
HairdryerQuiet54.21
MicrowaveBroke16.76
MicrowaveEasy58.65
MicrowaveCustomer Service16.91
PacifierCute57.44
PacifierFit53.54
PacifierSoft53.99

This suggest that:

  • The vendor of the hair dryer should take the powerful, hot and quiet feature of its product into serious consideration. Among them, hot is of first priority, followed by powerful, and lastly quiet.
  • The vendor of the microwave should carefully control the delivering damage, and try to listen to the customer and improve customer service. Also, when designing a new model of microwave, they should always take “easy” as the discipline of the design. Moreover, marketing should focus on how easy it is to use the microwave. 
  • The vendor of the pacifier should ensure its product to be cute, fit well and feel soft. Especially pay attention to cute features. The marketing strategy can be best set to claim the cuteness of the pacifier.
In order to improve the generalization ability of the model, we provide the way of calculating SI for a new quality descriptor named “NEW”:
  1. Get PNEW,1, 1 and PNEW,5, 1
  2. If PNEW,1, 1>PNEW,5, 1, set represent rating=1, else set represent rating=5
  3. Calculate SI.
  4. If represent rating=5, try to enhance the “NEW” feature of the product; if represent rating=1, try to avoid NEW to keep upset your customer.
  5. SI stands for the priority of features. Solve feature with higher SI first and pay more attention to it because it will improve the star rating to the maximum extent.

Model Conclusion

Our model completely addresses the requirements of Sunshine Company. We are pleased to see meaningful forecasts of future reputation and future fame are accurately given. We take a deep look into user behavior and identify a certain pattern of incitation, or Bandwagon effect, of users. At last we extract critical features of products to guide Sunshine Company to optimize its product design and marketing decision. 

Strength

In our model, we have fully considered various aspects of star rating and reviews that can be distributed over time, and have made comprehensive and accurate strategy development from the perspective of the merchant. 

For the important work of text mining and sentiment analysis, we have adopted a relatively advanced LSTM network. With this deep learning tool, we have completed a series of work of embedding word vectors and emotional polarity values, and perfectly converted text information into Numerical information suitable for subsequent operations. Many scientific research results and projects have proven its effectiveness.

For each question, we have adopted specific and different processing methods, which combine multi-disciplinary knowledge, use a variety of machine learning algorithms and statistical methods to achieve the most appropriate answers to different problems. In the comparison of the results of multiple methods, we have obtained effective strategies and pleasing internal results, making full and correct use of each set of data provided to us.

At the same time, based on the results obtained by the model, through detailed analysis and refinement, we use appropriate methods to convert the highly technical results into effective suggestions that can be directly adopted by the merchants, and ensure that the merchants can fully and accurately understand what we provide Guidance and marketing strategies to provide important help for their production and sales.

Potential Improvement

We alleviate the effect of seasonal trends and periodically trend through the use of rolling time windows. Yet there may be seasonal trends that help explain the variation of sales volume and review. This needs further evaluation.

Our model is partly based on machine learning, however, the data set we are provided is only less than 20,000 items, which might be too few to optimize our model.

References

[1] Rain, C. (2013). Sentiment analysis in amazon reviews using probabilistic machine learning. Swarthmore College.

[2]   Fang, X., & Zhan, J. (2015). Sentiment analysis using product review data. Journal of Big Data, 2(1), 5.

[3]   Mudambi, S. M., & Schuff, D. (2010). Research note: What makes a helpful online review? A study of customer reviews on Amazon. com. MIS quarterly, 185-200.

[4] Chua, A. Y., & Banerjee, S. (2016). Helpfulness of user-generated reviews as a function of review sentiment, product type and information quality. Computers in Human Behavior, 54, 547-554.

[5] Haque, T. U., Saber, N. N., & Shah, F. M. (2018, May). Sentiment analysis on large scale Amazon product reviews. In 2018 IEEE International Conference on Innovative Research and Development (ICIRD) (pp. 1-6). IEEE.

Artificial Intelligence In Retail Industry

[6] Bäck, E. C. (2013). Does Amazon Vine bias reviews? Internet & Technology (blog).

[7] Kleinbaum, D. G., Dietz, K., Gail, M., Klein, M., & Klein, M. (2002). Logistic regression. New York: Springer-Verlag.

[8] Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398). John Wiley & Sons.

[9] Jin, W., Li, Z. J., Wei, L. S., & Zhen, H. (2000, August). The improvements of BP neural network learning algorithm. In WCC 2000-ICSP 2000. 2000 5th international conference on signal processing proceedings. 16th world computer congress 2000 (Vol. 3, pp. 1647-1649). IEEE.[10] Sadeghi, B. H. M. (2000). A BP-neural network predictor model for plastic injection molding process. Journal of materials processing technology, 103(3), 411-416.

NASA Astronaut Rhea Seddon : First Female Astronaut Class of 6 Member

Artificial Intelligence In Retail Industry

Leading Artificial Intelligence and Financial Advisor – Rebellion Research