SlideShare a Scribd company logo
Building a Recommendation Platform
                     For E-Commerce Businesses
       Based on Hybrid Content Based - Collaborative Filtering
                       And Web 2.0 Concepts
                                 Techical University of Crete
                    Department of Production Engineering and Management

             Andreopoulos, Marios                                           Mandourarakis, Ioannis
                  tuc@andmarios.com                                          i.mandourarakis@gmail.com
 Department of Electronic & Computer Engineering                 Department of Electronic & Computer Engineering

                                               September, 2012

                                                     Abstract
         We propose an automatic recommendation platform for e-commerce businesses based on multi-
     criteria analysis and hybrid content-based - collaborative filtering, tailored to e-shops and cross-product
     category analysis, build with Web 2.0 concepts in mind. We are going to describe the input variables
     as well as methods to obtain them without frustrating customers, possible ways to pre-process the data
     and create a common query interface, suggest the recommendations output across a variety of scenarios
     and finally discuss metrics that can be used to measure the real return of the system.



Thanks                                                          A good recommendation system can increase a business’
                                                                up-selling rate and build stronger customer relations.
We would like to thank Professor Nikolaos Matsatsinis           The latter point should not be overlooked. The system
and Marina Karabatsa for their guidance.                        will make the customer spend more time on the e-shop
                                                                and promote frequent visits. We like to think of this as
                                                                customer - business “bonding” time.

Introduction                                                    In the next section we write about our proposed data
                                                                sources - signals and how to acquire them, taking ad-
                                                                vantage of the “social” conception which dominates the
Internet commerce has seen a tremendous boost over              world wide web nowadays. Then we proceed to sug-
the last decade. So much that for some products it              gest pre-processing methods for the gathered data in
rivals brick and mortar shops.                                  order to create a generic query interface upon which we
From marketing’s perspective the web can be seen as             can build recommendations. Recommendations are the
holy grail; access to millions of customers, many eager         main topic in our third section. Since our system sup-
to give feedback, as well as vast quantities of customer        ports many types of filtering and can do cross-product
behaviour data gathered automatically by web servers            and cross-rating suggestions, which ones, when and how
and databases. A real life playground for marketing sci-        should we present to a user? The fourth section deals
entists to apply and test theories. But with great data         with metrics and how we can verify that the system
sets and user base, comes the need for great questions          works toward the business’ benefit. The last section
and applications.                                               explores implementation’s details and business models
                                                                that can be used to build and support such a platform.
Our system is designed for the new generation of cus-
tomers, people who are now in their early 30’s or
younger and for companies that plan to stay relevant
for years to come. Recommendations’ quality will grow
in parallel with the business - customer relation.

    This work was done as part of an undergraduate course
on Small Business and Entrepreneurship.


                                                            1
1    Data Sources - Signals                                      Most people are ok and even expect to give this infor-
                                                                 mation since we live in the age of social. Of course
The signals we have in our disposal may come from a              there should always be a well visible option to skip the
variety of sources:                                              process and proceed to the shop.

  • Traditional web means, like web server logs, cook-
    ies and transactions/sales journal.                          1.1.3 Products Speaking for Themselves
  • A customer’s basic profile since in e-commerce
    anyone has to register a profile. These include at-           Products for an e-commerce platform are entries to a
    tributes such as sex, age, geographical location.            database. By using separate standardized fields for each
                                                                 of the characteristics we can easily perform content-
  • Products’ data, like specifications and product cat-
                                                                 based suggestions. A classification system is also of
    egory.
                                                                 use, to assign products to categories and subcategories.
  • User feedback which may be ratings and reviews
    through traditional means (like forms or star rat-
    ing) or modern tools (like games and social fea-
    tures).
                                                                 1.2 User Feedback; Earn it, Shape it
                                                                     for Multi-Criteria Analysis
The 1st and 3rd data sets are considered standard. The
2nd is standard but may be extended through social               User feedback is a broad concept. It includes every
features. The 4th one we have to design it ourselves             action a user may do that isn’t directly related to the
from ground up and is the main feature of this section.          shopping process.

                                                                 The most common feedback forms are ratings and com-
1.1 Standard Data Sources                                        ments. These days likes on facebook, tweets on twitter
                                                                 and +1s on Google Plus are also common. But there
1.1.1   Web Server, Cookies & Database                           are so much more we can do to get feedback from our
                                                                 customers.
        Logs

For the virtual world to function, logs are essential.
These logs, collected and used primarily for technical           1.2.1 Generic Criteria - Appealing to Cus-
reasons, can also be of assist to marketing strategies.                tomers’ Values’ System
From these logs we can extract information like which
pages a certain customer visited, how much time he               Our proposal is the use of generic criteria so that we can
spend on each one, if he reached a page following a link         create a profile for the customer which doesn’t rely on
from another page or if he clicked any links on this page.       specific product attributes. Since a customer may ex-
From our database we have access to things like which            pose different weights and thresholds for a criterion ac-
products a customer has bought, his orders’ values,              cording to a products’ category, we may adjust weights
when he placed his orders (time and date), what items            and thresholds according to category.
he bought together.                                              The proposed criteria are:
A smart system could learn from these data a cus-
tomer’s shopping habbits and use it for suggestions.               • price
                                                                   • quality
1.1.2   Customer’s Profile - Personalization                        • utility

In order to use an e-shop, someone usually needs an                • ease of use
account. This alone enables us to use all the other
                                                                   • coolness factor
features we write about. But also, it gives us more
data to process.                                                   • safety / trust
Typical registration data are name, address, age and
e-mail. E-mail is a handy but dangerous way to com-              Each of these criteria may be given a different name for
municate with customers. We will talk about it at the            different products but it actually refers to a customer’s
3rd section.                                                     values’ system.

Besides these typical data, there are more information           An example would be the “coolness factor”. An iPhone
we can ask from a customer upon his registration, like           may be a product with high coolness factor, but also a
to choose product categories that interest him most and          fresh vegetable or a vitamin in our age of “fitness” may
what he thinks as an acceptable amount of e-mail.                be an item with high coolness factor.


                                                             2
1.3 Basics: Ratings & Reviews                                   Also a social system implemented into the store could
                                                                be of high value. For example allow user profiles, simple
Ratings and review are considered standard these days.          walls, public favorite products, wish lists.
Ratings is the classic star system which usually takes
values from 1 to 5 stars and no stars equal to no rating.       1.3.3 Gamify the Store
The ratings’ titles the customer will see, don’t have to
be the names of the criteria we described previously and        Another hype of modern web design is gamification.
may vary from product to product.                               The process of applying game techniques to your prod-
Reviews can be scanned using automatic text analysis            uct (store in our case) to encourage users to engage
methods to extract useful information. We may be even           more.
able to omit some criteria from ratings (six rating areas       A basic example would be a personalized question.
could be too much for customer) and try to extract              Let’s assume you need a bit more data to calculate a
them from comments.                                             weak order. You can insert a question created auto-
It is important to encourage ratings and reviews. The           matically and personalized for the user on a sidebar. As
most great example of user feedback is Amazon.                  the user browse through your store, this simple ques-
                                                                tion, e.g. do you prefer product A or product B, appears
                                                                in an unobtrusive way. Most people would give in and
1.3.1   Incentives                                              click the answer. If another answer popped up, they
                                                                may continued to play for a bit more before returning
Giving people incentives to interact with your site and         to their shopping.
give you information is crucial.
                                                                Game design is a field of its own, but is an interesting
You can appeal to many individuals who value being              topic with many practical applications here.
of statue by implementing a user rating system, where
people with many useful reviews earn titles like expert
user.
                                                                2 Pre-Processing the Data
Another incentive would be offers. Like giving a 1% off
of the next purchase for any customer that leaves 10            This is the most challenging part of the process. Al-
ratings.                                                        though the implementation is left for the engineering
It is important though to distinguish between feedback          team, we may give some guidelines to assist to a suc-
for an item the customer has bought and an item he              cessful result.
hasn’t actually bought from our store.                          Because of the diversity of possible recommendation
Other incentives may include frequent contests. For ex-         scenarios, it is fundamental that the processing part cre-
ample a contest could be help us find the best mobile            ates such an application programming interface and/or
phone and it may become yours. We would then create             query interface that it will make it easy for the software
a short questionnaire with questions that can help our          engineers and administrators of an e-commerce business
marketing decisions. If you think about it, the cost of         to connect it to their platform.
such a feat is small compared to other known methods            Figure 1 displays the structure we propose for the rec-
of achieving the same results. You have to remember             ommender platform and connects the previous section
though that the questionnaires should really be kept            (input) with the current one.
sort. The challenge is that the customer should com-
plete the contest before it become tiresome for him.
                                                                2.1 Content-based Filtering
1.3.2   Social
                                                                In this type of filtering, we try to make suggestions
                                                                based on the characteristics of a product and/or the
The hype these days is social. Everything has to be
                                                                ratings of the individual we make recommendations
social, users expect everything to be social. The re-
                                                                for. This approach at its simplest form uses weights
views and user ratings system we proposed earlier are
                                                                to find recommendation candidates whereas more ad-
examples of social features. But one can move it even
                                                                vanced scenarios include machine learning techniques.
further.
                                                                It is a complex process as we have to teach the machine
A very simple way would be to incorporate social but-           to search for relations across various characteristics.
tons such as the renown Like button of Facebook, the
+1 of Google and the tweet button of Twitter. Every
time a user shares an item, we can give it a leverage in        2.1.1 Using Characteristics
the weak order of its category. If the user comments
with the share, we can perform automatic text analysis          This is the easier form for this type of filtering yet still
to the comment.                                                 needs a fair amount of work. Assuming standardized


                                                            3
We have defined six criteria that will be used to identify
                                                               products and explained that per category vectors will
                                                               adjust for weight and thresholds of each criterion for a
                                                               certain product.
                                                               The problem is that because of the nature of the data
                                                               collection process, we won’t have all criteria for any
                                                               product by each customer. We only get bits of informa-
                                                               tion from our customers. Our data looks like a puzzle
                                                               that statistics will help us solve.
                                                               This isn’t multi-criteria decision analysis (MCDA) as
                                                               used in marketing strategies, it is MCDA per customer
                                                               and have much greater error tolerance.
                                                               Let’s assume our store contains n products that cre-
                                                               ate the product space N .               We have six criteria:
                                                               a1 , a2 , a3 , a4 , a5 , a6 . For each criterion we have much
                                                               less data than n. For example, for criterion a1 we may
    Figure 1: recommender platform and input                   have k1 values, for criterion a2 , k2 values and so on.
                                                               We can calculate criteria’ scores adjusting for missing
                                                               values.
fields for product characteristics, we can find and sug-
                                                               Respectively, a customer may have rated a few items
gest products of the same category with similar char-
                                                               but omitted some ratings for each. We will calculate
acteristics. An example would be a visitor looking at a
                                                               his rating profile from these values.
hard disk with 2TB of space, a SATA connection and
cost around 100€. The system would proceed to find              We propose this process to include a secondary weight
disks with similar attributes and suggest them.                vector for different product categories. To calculate this
                                                               we will need items that the customer has both bought
A problem in the aforementioned technique is that a
                                                               and rated.
product has much more characteristics than the ones
that are important for a customer. For example hard            MCDA also requires a weak preference order. This,
disks have attributes like rotational speed and access         as described in the previous section, can be acquired
time. There are two solutions for this problem.                by asking the user questions using various games. For
                                                               example if a user has rated two items, an mini-poll be-
The simplest one would be to hard-code the important           tween these too could appear on a sidebar.
characteristics of a category. A human operator will
set the characteristics a customer is more like to find         The suggestions of this module of our platform will
important. A more difficult approach would be to have            mainly be for items that match a user’s rating profile.
the system learn through visitors’ page views and orders       Also the pre-processing that happens here is shared be-
which characteristics are of importance for a category.        tween the two filtering modules as we will see in the
                                                               next paragraphs.
This can be on a global level, so we find these charac-
teristics that are more probable to be of value for any
customer. We would expect a fairly good experience for         2.2 Collaborative Filtering
most customers with this approach.
Another way would be to make it on a personal and              This type of filtering uses customer data to make recom-
temporal level. We make the first suggestion to the             mendations. It tries to find persons with similar tastes
customer using global data and based on his choice, we         as the current visitor and then proceeds to suggest items
adapt our weights to make suggestions based on similar         that they (people with similar taste) like.
characteristics of the last viewed items. This technique       This is easier than content based filtering in the sense
would be better for people who like to do more research        that the system doesn’t need to understand the under-
before they buy, so there could be a trigger for such a        lying data, like product characteristics, but only to find
technique based on the customers’ page views profile.           persons with similar taste. Also it works well for cross-
                                                               category suggestions. On the other hand it requires
                                                               better hardware as it operates on big datasets, it needs
2.1.2   Ratings and Multi-Criteria Analysis                    a large amount of data before it can deliver good sug-
                                                               gestions and requires customers with a decent history
Ratings are the center of content based filtering. They         of purchases and/or ratings.
can be used to suggest goods that a customer isn’t
                                                               Generally we can divide the process in two distinct
searching for.
                                                               phases. First we have to find a group of users with
Multi-Criteria Analysis can be used as a processing            similar taste. This may be accomplished through clus-
method for this part of the Content-based Filtering.           tering methods or nearest neighbor search. Then we


                                                           4
have to make suggestions based on the choices of the           2.3.1 No-Knobs Approach
group. For a traditional approach (known as memory
based) we could use the rating data of each user in the        A no-knobs approach to the Query Interface could be
group applying weights for user taste compatibility. A         further studied. In this approach the platform acts
more complex approach (model based) would require              in an automated fashion, auto-tuning its results. We
data mining and machine learning techniques to find             would advise against it as a primary goal. The platform
patterns in the underlying data.                               should be implemented with manual tuning. Then, as
                                                               it gathers more data, one could try to run in parallel
                                                               a no-knobs implementation of the Query Interface and
2.2.1   Multi-Criteria Analysis                                compare the results.

Multi-Criteria Decision Analysis was described before
but in Collaborative Filtering it can give even better         3 Recommendations System
results. Besides, in marketing, MCDA is used not only
to profile products but to find market segments.
                                                               The recommendations (or recommender) systems are
MCDA actually follows the work-flow we described for            becoming increasingly popular for the last 4 years (2008
generic Collaborative Filtering. It has four phases:           to 2012). This is mainly due to the huge penetra-
                                                               tion that mobile devices (smart phones, tablets) and
 • Data acquisition which we described in section 1.           internet services (social media) have introduced to the
                                                               consumer market. Some of the most well-put recom-
 • User modeling which we described in Content
                                                               mendations until now are the product suggestions of
   Based Filtering.
                                                               amazon.com, the song suggestions of Bang & Olufsen,
 • Clustering, the 1st part of Collaborative Filtering.        iTunes and Pandora Radio, the movies and video sug-
                                                               gestions of Netflix and YouTube, the search suggestions
 • Recommendation Phase, the 2nd part of Collabo-
                                                               of Google Trends, the photo suggestions of Pinterest,
   rative Filtering.
                                                               the network connection suggestions of Facebook and
                                                               Linkedin.

2.3 Building an API - Query Inter-                             As expected, the recommendation systems are slowly
                                                               gaining ground in all current online activities that can
    face                                                       both directly and indirectly opose a marketing share
                                                               opportunity.
A good recommendation platform is tailored to each
product -or one level up: product’s category for a cer-        There are numerous of algorithms that have been im-
tain store. Thus the people who build and maintain             plemented in the design of recommendation systems.
the electronic commerce store should adjust the recom-         One of the most commonly used algorithms in recom-
mendation platform to their needs. Hence the need for          mender systems is the k-nearest neighborhood (k-NN)
an application programming interface including a query         approach. k-NN is a method for classifying objects
interface, easy to build a GUI around it.                      based on the properties of its closest neighbors in the
                                                               feature space. In k-NN, an object is classified through
The API should offer a handy way to feed data to the            a majority vote of its neighbors, with the object being
system, perform queries and accept results from it. It         assigned to the class most common amongst its k near-
should be well defined and easily accessible from current       est neighbors (k is a positive integer, typically small).
web technologies. It is an important design task for the       If k = 1, then the object is simply assigned to the class
engineering team.                                              of its nearest neighbor.
The Query Interface is a mean to abstract the com-             Another popular algorithm is the Pearson Correlation,
plexity of the platform from the people who run the e-         which can be proved especially useful as we can see in
commerce store. An example of a query interface would          section 4 of this paper. Pearson Correlation is a mea-
be the ability to instruct the platform to return 5 sug-       sure of the (linear) dependence between two variables
gestions, of which 3 will come from the content based          X and Y, giving a value in the range of +1 and −1
module and 2 from the collaborative module through a           inclusive. In a social network, a particular user’s neigh-
simple query. So in essence it is a standardized input         borhood with similar taste or interest can be found by
of short queries that translate to a subset (albeit the        calculating this coefficient. The user’s preference can
most important ones) of the total queries our platform         be predicted by collecting the preference data of the
can do. The Query Interface is part of the API.                top-N nearest neighbors of a particular user (weighted
                                                               by similarity).
If the recommender system is built in-house, then the
API can be omitted but it is advisable not to, since the       One last well-known method, which can also be proved
implementation team will probably differ from the team          useful in section 4, is the Rocchio Classification. This
maintaining the store. It can be of a less advanced form       is a method of relevance feedback dating back to the
though.                                                        1970s. Rocchio makes use of the Vector Space Model


                                                           5
and is based on the assumption that most users have             The key question is what happens to the relationships
a general conception of which items should be denoted           between smaller groups of people around the magnitude
as relevant or non-relevant. User feedback is used to           of a handful of hundreds? Do these relationships exist
refine a search query by emphasizing or deemphasiz-              and in what extent do these people influence each other?
ing certain terms (similar to how Pandora refines its            Eric T.Bradlow et.al. (2005) showed that this maping
user recommendations). Through feedback, the user’s             can be achieved and also can produce some very satisfy-
search query is revised to include an arbitrary percent-        ing results. Around the same period many studies were
age of relevant and non-relevant terms as a means of            based on the same concept. They verified that the spa-
increasing the search engine’s recall, and possibly the         tial models can be successfully applied and also generate
precision as well. The number of relevant and non-              some very satisfying estimations about the consuming
relevant terms allowed to enter a query is dictated by a        actions of such groups.
series of weights in the central equation.

                                                                3.2.2 Spatial as in ‘type’
3.1 Hybrid Filtering
Hybrid filtering suggests the combination of collabora-          Similar studies carry on until today, progressing with
tive and content-based filtering, especially if it proves        some really intriguing results about the application of
to be more effective, depending on the case. A quick re-         the special models in social media. Just recently on-
minder: content based filtering is being produced based          line services like Myspace, Facebook, Twitter, Linkein
on information that log the customer’s profile and taste         and Google+ came to play and advertisers advanced
while collaborative filtering is being produced by match-        their way by making the best out of this knowledge.
ing the habits that arise between users that seem to            Nowadays the interactive banners, the tag clouds and
belong to the same group.                                       the infographics produce beautiful aesthetics which lure
                                                                the individual ‘to blend in’, ‘to belong’ and ’to follow’
Hybrid approaches can be implemented in several                 the trends that his friends, colleagues and associates are
ways: by making content-based and collaborative-based           up to.
predictions separately and then combining them; by
adding content-based capabilities to a collaborative-           In advertising, spatial and temporal placement can ac-
based approach (and vice versa); or by unifying the ap-         tually become very complex. The research field of each
proaches into one model for a complete review of recom-         can sum up a completely separate sector in the science
mender systems. Several studies empirically compare             of marketing. This is because there are a lot of pa-
the performance of the hybrid with the pure collabora-          rameters to be considered, like for example the special
tive and content-based methods and demonstrate that             maping of each individual banner (or recommendation)
the hybrid methods can provide more accurate recom-             according to geographical, demographical and psycho-
mendations than pure approaches. These methods can              metric criteria. Another concern is the analysis of the
also be used to overcome some of the common problems            spatial drift phenomenon which has to do with the ex-
in recommender systems such as cold start (not enough           ponential decrement of an action’s effect (i.e. gener-
gathered information to generalize) and the sparsity            ation of a specific recommendation) around a ’neigh-
problem (having too few ratings and hence too few cor-          bourhood’ whether this is considered to be in space or
relations between users).                                       time.


3.2 Recommendations spatial place- 3.2.3 Spatial as in ‘surface’
    ment
                                                                As far as it concerns the spatial placement of the web-
3.2.1   Spatial as in ‘scale’                                   store’s layout, the corresponding ‘social media plug-ins’
                                                                are most commonly being placed around the page’s
The key assumption in the traditional marketing liter-
                                                                edges. This is due to the fact that they can easily
ature is that the consuming behavior of an individual
                                                                fill in the space between two seemingly irrelevant ad-
is conditionally independent of the consuming behav-
                                                                vertisement banners and attract the attention of the
ior of another individual. This means that the decision
                                                                reader to the surrounding area. The recommendation
making process of a sole customer is considered to be
                                                                systems usually present information that is being au-
unaffected by the decision making process of another
                                                                tomatically refreshed but always semantically attached
customer, let alone a group of customers.
                                                                to the consuming preferences of the customer’s network
We know that the latter simplification can be very use-          connections. This way they attempt to maximize the
ful in some special cases but this requires extra caution       chances of him indulging or interacting with the adver-
when applied in general. So, when researches study              tisement. The same plug-ins are usually implemented
large groups of people they tend to adopt theories that         in a way that they can learn and improve their sugges-
stand to the concept of ‘sole identity’ assigning a per-        tions according to the customers behavior and use this
sonal human-like behavior to a crowd.                           information as a feedback to his connections too.


                                                            6
3.3 Recommendations                        temporal             indirect one which will attempt to persuade him about
    placement                                                   the company’s current concern or its vision about the
                                                                future.
Whatever we do, however we do it, regardless of how             Nowadays similar techniques are being used by many
important it may seem at the moment, it will actu-              firms and provide results which solely depend on the
ally be quite useless or insignificant if attempted in the       fashion making abilities of their respective departments.
wrong time. So, timing is of great essence and this ap-         Big brand-names which hold a solid corporate identity
plies for every action we take in life. Likewise, every         often produce content like this to promote their innova-
temporal placement of a campaign, any recommenda-               tive services and products and at the same time to avert
tion technique, if properly designed, can produce great         a competitor from impinging to the same one. Smaller
results, but in any other case may have no effect or             brands do the same in order to assiduously make their
even worse, the opposite of the one that we desire and          way in a niche market share.
expect.
Time, unlike space, is a dimension that opposes the
same limits to every marketer. Expertise is always con-         3.3.2 Measure Validation
sidered an asset, but in the field of time, all ‘players’
share about the same chances of success. No-one can             Marketing measures must efficiently estimate the per-
establish an advantage in the field of time regardless           formance of a particular strategy. The indexes used in
of how huge or small brand name he claims to own.               such cases can either have some physical meaning or
And the same goes for the recommendation systems                not, its doesn’t really matter as long as marketers are
too. Conquering in the field of time demands good                able to understand how they work and why. Two com-
strategy skills and an intuitive ability to foresee the         mon scientific criteria for the meaning of measures are
market trends.                                                  reliability and validity.
Researchers have proposed a lot of versatile ideas about
                                                                Reliability can be verified when testing the correlation
the formulation of relevant mathematical models which
                                                                between the measure taken at different times (retest
predict such fashions and in some extent they generate
                                                                stability) with equivalent forms or with split halves (in-
some good results. But machine learning techniques,
                                                                ternal consistency).
artificial intelligence, fuzzy logic and neural networks,
even when combined all together or used extensively in          Validity on the other hand some many forms. There is
simple cases, can (until now that is) not reproduce the         face or consensus validity which exists when a measure
results that a talented visionary human may have, just          looks as if it should indicate a particular variable or
because successfully forming the future will always be          concept. Using this form is not safe because studies
more rewarding than managing to predict it.                     have shown that scores on recognition measures can be
                                                                influenced by irrelevant response set.

3.3.1   Placing Strategy                                        Another, more objective form of validity is predictive
                                                                or concurrent validity. Predictive validation procedures
By taking into consideration all of the above we con-           consist of determining the extent to which particular
clude that an automated recommendation system must:             measures predict other ‘criterion’ measures, so it has
                                                                much pragmatic meaning in marketing. This measure
  • focus on small time frames and make quick but               validation exhibits some pros and cons.
    small steps rather than slow and big ones in order
    to keep all options open until the very last minute         When the predictions based on the latter form are be-
    of commission (pivoting),                                   having too fuzzy the certainty of the measuring results
                                                                can be enhanced by using the measure validation which
  • exhibit preference in aborting if the recommenda-
                                                                consists of two parts: convergent and discriminant val-
    tion has a high risk of being unwelcome in the des-
                                                                idation. The first is synonymous with predictive or
    ignated time of prompting (because you can never
                                                                concurrent validation, which means that a measure can
    be too late as trends cycle over times, but you can
                                                                adequately represent a variable if it correlates or ‘con-
    be too quick!) and
                                                                verges’ with other supposed measures. Discriminant
  • try to lead the market by rather shaping the con-           validation is absolutely necessary to really pin down the
    sumers’ opinion (branding) rather than passively            meaning of measures. This is because a measure may
    awaiting for a feedback on about what are they              converge with measures of other variables in addition
    about to wish/like. A recommendation system can             to the one of interest.
    achieve that by enriching the conventional content
    with injected information (impression) that will be         Finally, construct validation can only be considered
    intentionally kept out of focus.                            after measure validation is established. This valida-
                                                                tion actually provides a proof of concept as it checks
So, the subject will hopefully be the recipient of two          whether a hypothetical construct, composed of several
messages, a straightforward one, which will literally en-       similar variables, actually operates in the scientifically
courage him to try a special product or a service, and an       expected way.


                                                            7
3.4 Recommendations outside the                               and you should always ask for a customer’s approval
    store                                                     before using them.
                                                              But mobile apps can also be used to gather feedback
Internet and mobile platforms offer many ways to reach         from customers. Smart-phone games are popular these
to your customers outside of your e-commerce site.            days; why not build one that also does a bit of market
Let’s explore two of the most popular.                        research?
                                                              As good as all these seem though, the mobile sector
                                                              is managed mainly by advertisers, so you may have to
3.4.1   E-mail
                                                              work with them.

E-mail is a dangerous field. It is absolutely a love or
hate factor to your customer relations. Do it wrong and
you will damage your image, do it right and you will          4 Metrics; Quantify the Suc-
see frequent visits to your business. A middle ground           cess of the System
doesn’t exist.
We suggest that email should be used for sending rec-         Data mining techniques can be proven of great essence
ommendations to the customers but with the utmost             if they are used properly in systems like our proposed
care. Recommendations send through email, should              marketing platform. They hold the strength to reveal
contain a mix of suggestions relevant to items the cus-       well-hidden social habits, market trends, customer pro-
tomer saw during his last visits and suggestions that         file characteristics and other natural patterns which in
are of his taste.                                             the general case tend to be ignored or undervalued.
                                                              They can also make use of seemingly arbitrary data,
The frequency of e-mails should be related to the fre-
                                                              given as an input and produce some very interesting
quency of visits the customer pays to the shop. A
                                                              estimations in their output.
less interested customer will probably be offended by
frequent email. Expected etiquette for communication          But the output itself needs further studying by an ex-
between a business and an individual suggests is to be        pert, usually a human, in order to be interpreted and
reasoned. Contact the customer only if the business           accompanied by a logical reasoning. Also, even if pro-
have some offers on items he may be interested or there        cessed automatically, the expert must provide a well-
are new arrivals of items he may be highly interested.        pointed definition of their application domain in order
                                                              to lessen the fuzziness that usually dominates around
                                                              the theoretical models of a respective AI expert-decision
3.5 Mobile - Suggestions on the                               system.
    Road                                                      So, the intervention of a human expert in the process
                                                              although undesirable is usually encouraged because the
Every day smart-phones and other web-enabled devices          proper solution of an automated artificial intelligence
make their way into customers’ pockets by the millions.       machine cannot always warranty a satisfying result.
Erick Schmidt of Google on September 5th, 2012, an-           This is due to the fact that nature’s patterns, let alone
nounced that Android devices alone have over 1.3 mil-         laws, are formed in a non-deterministic and non-linear
lion activations per day!                                     way and the solutions produced are only a fair estima-
                                                              tion of the truth.
A significant part of e-commerce will be transferred to
the mobile in the coming years and a whole new set of         Scientists know that the complexity of such problems
possibilities will open. There are already applications       can be reduced by linearization techniques. In our pro-
that take advantage of the mobile, though not widely          posed platform we do the same by modeling a good
deployed nor accepted as much.                                linear substitute, which, in other words, means that we
                                                              build a system which uses its linear inputs to ‘sense’
E-commerce businesses may use the mobile through mo-          some quantities (here the market’s trends) and learn
bile web sites and applications. But this use doesn’t         how to adapt its outputs accordingly (here the cus-
differ much from the typical web use, so everything we         tomers’ recommendations) to maximize the profit.
said until now applies.
                                                              To accurately identify the dominant factors that rule
The game-changing feature of smart-phones is that peo-        the market’s trends and quantify their importance in
ple are constantly connected, have their device always        each scenario, we use special indexes called metrics or
at hand and are frequently tracked. For example a cus-        KPI (key performance indicators). The enumeration
tomer’s location can cue us as to when to send rec-           of the existing KPI is a difficult job since it is almost
ommendations to his device. If we know his location           impossible to gather information on already working
on block level, we can detect when he is out shopping         systems but most companies tend to use some of the
and in some cases what he is shopping. If we can pin-         metrics already made known by the international bib-
point his location to a few meters, we may even know at       liography. Companies tend to gather a lot of data (i.e.
which shop he is. Such techniques may bring criticism         by market research using questionnaires) or use many


                                                          8
different KPI in order to successfully detect the pulse            This feedback strategy ensures that the system will
of the market in any given time desirable (i.e. in order          eventually and concurrently blend in to the market
to apply a promising marketing campaign, change their             needs by inheriting its trends. This method is similar
brand orientation, use pivoting, etc).                            to an automated multi-criteria analysis and promises
                                                                  to produce fast and traceable results with increasing
But neither the gathering of a high volume of data or
                                                                  marketing efficiency.
the application of a well-guessed selection of the corre-
sponding KPI can guaranty the success of the applied              In the subsections to follow we appose some of the most
decision. High volume of data is source consuming and             promising KPI which are frequently used by companies
impractically optimistic while the random selection of            and organizations.
KPI itself defies the very nature of non-determinism in
non-linear and complex problems. This is because an
observer can only look for logical and simple explana-
tions of why things work in a specific way, lucking the            4.1 OCR - Order Conversion Rate
ability, or even worse neglecting the comprehension, to
realize and/or understand the existing correlations be-
tween all the factors that balance a phenomenon to a              Order conversion rate happens to be one the prime most
given state.                                                      important values of KPI. Generally a conversion rate
                                                                  refers to the percentage of events leading to another
Also, statistics in general are doomed to become useless          event. In e-commerce it usually refers to the percentage
if the policies involved around the data aggregation and          of visits that convert into orders. Most web analysts
process cannot generate coherent results that will lead           define the conversion rate as the percent of site visitors
to a straightforward and cost efficient strategy. Unfor-            who do something that the company wants them to do,
tunately, the techniques usually inherited by medium-             like submit an order, sign up for an email or send a
scale companies are being efficient but there is no safe-           share link and so on.
way to ensure that they are also the optimum, so they
sometimes can exhibit significant shortcomings.                    The order conversion rate is an important metric but
If we may address some of the main reasons for this, we           it just shows how the minority did something but not
believe that its due to the fact that a suitable KPI is ap-       why the majority did not do anything. This is because
plied in the wrong context or used in the wrong extent            it is not always the main aim of visitors to go to a
and because the system is forced to process and inter-            homepage to buy something. Customers also intend to
pret high-volume data which are bringing back fuzzy               do research, look for information, read a blog or just
and thus untrustworthy results.                                   need advice. So OCR and similar CR are useful but
                                                                  only if combined with other KPIs too.
To counter that, our proposed scheme tries to check
which metrics (KPI) are highly correlated with the
desired outcome (overall profit) and assigns special
‘weighting factors’ to each one of them in relevance to           4.2 RpV - Revenue per Visit
their estimated correlation. The idea is to use the col-
lective information of many different metrics in a ver-
                                                                  Average revenue per visit is a well-known marketing ac-
satile way to make them form the optimized ‘success
                                                                  quisition indicator. It is defined simply as the sum of
quantifiers’ that best suit the e-commerce business in
                                                                  revenue generated / number of visitors. This metric
question. The very first time the ‘weighting factors’ get
                                                                  is being raised when more valuable customers are at-
random values but after successive passes the system
                                                                  tracted to the e-commerce business store.
will tend to settle to a specific state.
We achieve that by inducing a dedicated feedback sys-
tem (an example would be a system based on genetic
algorithms). This measures the system’s efficiency and              4.3 Page Visits
adjusts the relevant importance of each metric accord-
ingly. Each weighting factor represents the metric’s
                                                                  Page views is a metric mainly used to quantify the ef-
significance towards the desirable effect, which is to in-
                                                                  fect that a marketing campaign has on a specific target
crease the profit without compromising (not a lot at
                                                                  group and can produce some nice conclusions especially
least) the rest of the metrics weighting effect.
                                                                  when studying the temporal and spatial allocation of
If the metric in question proves to be important then             the channels that the customer traffic passed through
it makes sense to focus more effort on its fine-tuning,             until it arrived to a specific page on the web-store. Page
so its highly-rated. In any other case the metric’s re-           visits comprise a very important index for the system we
sults must be considered stochastic and thus its influ-            propose. It can distinguish the extent in which our vis-
ence must be semantically regarded as untrustworthy,              itors are motivated to visit the web-store due to some
so in every pass, the system assigns a small value to             social media campaign, quiz, etc and when combined
it, and subsequently this becomes small enough to be              with the RpV it can show the clout that each social
ignored.                                                          media poses upon our visitors over time.


                                                              9
4.4 AOV - Average Order Value                                For example one could use the cart just to get a pic-
                                                             ture of the ‘total cost’ of his wishing list, although he
Average order value is considered a key performance          never meant to buy the products this instance. Also, a
indicator when combined with revenue per visit and           customer may need to use the cart just to collect some
order conversion rate. The basic calculation is Average      of his favorite stuff in one place, due to the luck of a
Order Value = Sum of Revenue Generated / Number              ‘favorite’ or ‘share’ button. He may also do the same
of Orders Taken.                                             thing if he wishes to use the links of two or more similar
                                                             products he wishes to compare but the site lucks such
Usually experts work diligently on both OCR and AOV          a service. A low percentage of visitors who purchase an
in order to improve both at the same time by seg-            item after adding it to the cart could also may mean
menting visitors and marketing campaigns into high,          that your shipping rates are high or that your checkout
medium and low AOV groups. This can help iden-               template is confusing.
tify the special marketing approach that the company
should apply on each group in order to get the best          So, the ‘cart abandonment’ results must always be ana-
collective outcome.                                          lyzed in accordance to criteria that are stripped off such
                                                             scenarios or take them into consideration by proper nor-
Managing the balance between OCR and AOV can be              malizing factors. And one of the aims of the proposed
a tricky part. A small increase to one could mean the        platform is to achieve this normalizing efficiently and
drop of the other and if this drop is of great magnitude     automatically.
compared to the rise then the revenue per visitor may
be strongly impacted.
                                                             4.8 Up-selling rate
4.5 Customer return                                          The up-selling term indicates the percentage of success-
                                                             ful ’hits’ on a recommended order of a product similar
This is one very important metric because it signifies        to the one that the customer just ordered or intended
the amount of dedication that the target group shows         to order. Up-selling is considered to be successful if it
to the business. This group is the one that consciously      leads to a more profitable sale without jeopardizing the
supports in both direct (buys) and indirect (personal        good relationship that the organization has built with
recommendations to friends) ways the company.                the customer. So, to produce a feeling of comfort to
                                                             the customer, the offer must be very well-pointed by
Again, this KPI can provide valuable feedback to the         providing a recommendation that enhances the value
expert decision system especially when combined with         he gains from the bargain. An example would be to
others.                                                      propose the purchase of one more product which is a
                                                             warranty extension on the hard disk drive that the cus-
                                                             tomer is willing to buy.
4.6 Time spent on store                                      The similarity between the two products (i.e. war-
                                                             ranty extension and hard disk drive) will be modeled
Time spent on a store can have a lot of different mean-       by a real number called ‘relation-strength’. The ‘rela-
ings. More time could mean more chances for a cus-           tion strength’ can be defined by an expert at first by
tomer to indulge himself into an order or it could mean      setting the appropriate values in a product-to-product
a bad design that makes it hard for the customer to find      matrix. When the system becomes operational, its algo-
what he needs. In the second case it is mostly guar-         rithms fine-tune this matrix by reassigning new values
anteed that the ‘customer return’ KPI will get affected       to it, as indicated by the real market’s feedback. As
and become smaller. So this is where data-mining tech-       the cautious reader can notice, this idea is based on a
niques come to play and prove to be very handy. If the       principle similar to the one we have already described
volume of the collected data is sufficient enough (and         about the corellation weights between a KPI and the
not extremely large) then this metric can reveal very        overall profit.
useful information about the quality of the site’s design
and/or the applied marketing technique at hand.
                                                             4.9 Cross-selling rate
4.7 Cart abandonment                                         The cross-selling term is very similar to the up-selling
                                                             one and there is only one small distinctive difference.
If used properly, the metric of ‘cart abandonment’ can       It indicates the percentage of successful ‘hits’ on a rec-
be a very helpful asset in determining what products         ommended order of a product seemingly irrelevant to
and categories are driving the most abandonment and          the one that the customer just ordered or is about to
under which circumstances. But it should be used with        order. A valid example could be the proposal of a car
extra caution because it can get easily misleading. The      charger for a mobile phone to someone who bought a
problem does not always lie on the shipping costs or the     memory card for his laptop. It seems irrelevant but the
billing fashion.                                             system can guess the customer’s financial status as he


                                                            10
is aged enough, he just bought a laptop, he is male and       lower abstraction computer languages that tend to per-
he has declared a mobile number so he probably owns           form better. To glue all the parts together, we would
a car too. The system could also know nothing about           use higher abstraction, easier to handle languages. We
this customer but could extrapolate information from          would make our API’s compatible to industry standard
the profile patterns of other customers that fit in the         technologies, easy to use and with good documentation,
same target group.                                            to help clients connect their e-commerce businesses to
                                                              our platform.
As far as it concerns the implementation of the process-
ing of this profiling information, the only thing needed
is a medium-scale database (i.e. over 1000 records) and       5.1 Making a profit
the algorithmic strategy already stated in the previous
section.                                                      Implementing the platform won’t come cheap. The pri-
                                                              mary cost will be the salaries of the engineers needed.

4.10 RoI - Return of Investment
                                                              5.1.1 Software Product
All the metrics mentioned above could be considered
as variables (inputs) which will are being set according      The common approach is to release the platform as
to the results that are generated by RoI. So, RoI value       a software product which businesses can buy and
can, apparently be considered as a proposed output and        then achieve higher revenue through technical support
its value a relevant estimator of the system’s efficiency.      and/or software upgrades. For such an approach to
                                                              work, you have to make your software as compatible
In general, return of investment is the amount of profit a
                                                              and easy to setup and use as possible. This leads to
certain policy can produce in relation to the cost of this
                                                              much higher costs and may be a limiting factor to the
policy’s adoption. In the proposed multi-criteria anal-
                                                              number of features you can implement. Also in this ap-
ysis this metric can reflect the company’s total profit
                                                              proach you should generally try to release a product as
change rate in relation to the absolute value of the
                                                              perfect as possible, since fixing mistakes on customers’
change of a respective KPI. So RoIOCRt1 will be the
                                                              side isn’t viable. This will make your time schedules
value that RoI acquires at the moment t1 when OCR,
                                                              longer.
and only OCR, changes. In the special case where
RoIOCRt1 appears to be monotonous in the range we
are interested in we can safely make future predictions       5.1.2 In-House
according to the graph’s envelope or the projections of
RoIOCRt1 trend-line.                                          For a large e-commerce business, it would make sense to
                                                              hire a team to implement the platform in-house. This
                                                              approach has the benefits of a tailored solution. It can
5    Implementation                                           be tweaked and adjusted to the company’s needs.
                                                              But for it to make sense, it would have to give the sales
Our proposed recommender platform is a challenge to           a boost much bigger than a ready made solution. We
implement. It has high complexity and a big volume of         would say that the expected profits’ increase on top of
necessary features.                                           the profits’ increase a ready made solution would bring,
                                                              should be such that the implementation should pay for
A team of software engineers with a strong background         it self in two to three years.
in math would be able to bring it to life. Alternatively
a team of software engineers with structured thinking
and a mathematician with a background in statistics           5.1.3 Software as a Service (SaaS)
and algebra would also be able to pull it off.
                                                              There is much hype around this concept these days.
The system should first be carefully designed and then
                                                              A company would implement and deploy the platform
implemented. Because of its complexity and many pos-
                                                              to its own servers and then customers would pay for
sible points of failure, it would be best to partition the
                                                              access to it (usually in monthly or annually plans). This
platform to many small modules with distinct purposes
                                                              approach from a financial standpoint has the advantage
and write some test cases.
                                                              of a steady revenue. Also, because of the savings on the
The technologies that are going to be used should be          customers’ side due to lower maintenance costs, he will
chosen by the design team. The business model that            accept to pay more for our product.
will be used may be of importance in this decision.
                                                              In the implementation front, this approach increases
Ourselves, we prefer the SaaS approach (explained             the running costs due to the need for servers and an
later) and we would proceed in an implementation on           administration team. Also it increases your responsi-
the cloud that would be highly scalable. Parts of the         bility since part of your customers’ business runs on
design that are CPU bound we would implement them             your servers. On the other hand, the ability to perform
using high performance DBMS and/or in-house code in           maintenance and upgrades at any time to our software,


                                                             11
facilitates the software engineering part and let us do in-    References
cremental enhancements, test features and do rollbacks
when needed, thus allowing less demanding times and            [1] Nikolaos F. Matsatsinis and Yannis Siskos
resulting in a more robust product from the client’s per-          MARKEX: An intelligent decision support sys-
spective.                                                          tem for product development decisions. European
                                                                   Journal of Operational Research Vol. 113, No. 2
                                                                   (1999), pp. 336-354 Article Stable URL: http:
5.1.4   Open Source                                                //dx.doi.org/10.1016/S0377-2217(98)00220-3

Many software products choose the path of open source,         [2] Lakiotaki, Kleanthi and Delias, Pavlos and
commonly known as community editions of the prod-                  Sakkalis, Vangelis and Matsatsinis, Nikolaos User
uct. This doesn’t stop a company from selling, renting             profiling based on multi-criteria analysis: the role
(SaaS) or charging technical support for its platform.             of utility functions. Technical University of Crete
                                                                   Decision Support Systems Laboratory University
The main advantage is that programmers from all over               Campus 73100 Chania Greece Operational Research
the world will help you build, test and maintain your              Springer Berlin & Heidelberg Vol. 9, No. 1 (2009),
product. Most usually you will find that these people               pp. 3-16 Article Stable URL: http://dx.doi.org/
are your customers who decide to implement a certain               10.1007/s12351-008-0024-4
functionality themselves.
                                                               [3] Lakiotaki, K. and Matsatsinis, N.F. and Tsoukià
Also administration teams (which will connect their                ands, A. Multicriteria User Modeling in Recom-
company’s e-shop with your platform) tend to feel more             mender Systems. Intelligent Systems, IEEE Vol. 26,
safe with open source approaches as they know they can             No. 2 (Apr., 2011), pp. 64-76 Article Stable URL:
tailor a solution to their needs if needed and that your           http://doi.acm.org/10.1109/MIS.2011.33
product’s viability doesn’t depend on your businesses
                                                               [4] C. Moghrabi and M.S. Eid Modeling users through
viability.
                                                                   an expert system and a neural network. Computers
                                                                   & Industrial Engineering Selected Papers from the
                                                                   22nd ICC and IE Conference Vol. 35, No. 3-4 (1998),
                                                                   pp. 583-586 Article Stable URL: http://doi.acm.
                                                                   org/10.1016/S0360-8352(98)00164-8
                                                               [5] Liu, Liwei and Mehandjiev, Nikolay and Xu, Dong-
                                                                   Ling Multi-criteria service recommendation based
                                                                   on user criteria preferences. Proceedings of the fifth
                                                                   ACM conference on Recommender systems - Rec-
                                                                   Sys (2011), pp. 77-84 Article Stable URL: http:
                                                                   //doi.acm.org/10.1145/2043932.2043950
                                                               [6] Mehdi Dastani and Nico Jacobs and Catholijn M.
                                                                   Jonker and Jan Treur Modelling User Preferences
                                                                   and Mediating Agents in Electronic Commerce.
                                                                   1999.
                                                               [7] Beliakov, Gleb and Calvo, Tomasa and James, Si-
                                                                   mon Aggregation of Preferences in Recommender
                                                                   Systems. School of Information Technology, Deakin
                                                                   University, 221 Burwood Hwy, Burwood, 3125
                                                                   Australia Springer US (2011), pp. 705-734 Ar-
                                                                   ticle Stable URL: http://dx.doi.org/10.1007/
                                                                   978-0-387-85820-3_22
                                                               [8] Fabian Abel, Silvia M. Baldiris, Nicola Henze User
                                                                   Modeling, Adaptation and Personalization. Adjunct
                                                                   Proceedings of the 19th International Conference on
                                                                   UMAP, Poster and Demo Track (Jul., 2011)
                                                               [9] Roger M. Heeler and Michael L. Ray Measure Vali-
                                                                   dation in Marketing. Journal of Marketing Research
                                                                   Vol. 9, No. 4 (Nov., 1972), pp. 361-370 Article Sta-
                                                                   ble URL: http://www.jstor.org/stable/3149297




                                                              12
Contents
Introduction                                              1

1 Data Sources - Signals                          2
  1.1 Standard Data Sources . . . . . . . . . .   2
      1.1.1 Web Server, Cookies & Database
             Logs . . . . . . . . . . . . . . . . 2
      1.1.2 Customer’s Profile - Personalization 2
      1.1.3 Products Speaking for Themselves      2
  1.2 User Feedback; Earn it, Shape it for
      Multi-Criteria Analysis . . . . . . . . . . 2
      1.2.1 Generic Criteria - Appealing to
             Customers’ Values’ System . . . .    2
  1.3 Basics: Ratings & Reviews . . . . . . . .   3
      1.3.1 Incentives . . . . . . . . . . . . .  3
      1.3.2 Social . . . . . . . . . . . . . . .  3
      1.3.3 Gamify the Store . . . . . . . . .    3

2 Pre-Processing the Data                         3
  2.1 Content-based Filtering . . . . . . . . .   3
      2.1.1 Using Characteristics . . . . . . .   3
      2.1.2 Ratings and Multi-Criteria Analysis 4
  2.2 Collaborative Filtering . . . . . . . . . . 4
      2.2.1 Multi-Criteria Analysis . . . . . .   5
  2.3 Building an API - Query Interface . . .     5
      2.3.1 No-Knobs Approach . . . . . . .       5

3 Recommendations System                                  5
  3.1 Hybrid Filtering . . . . . . . . . . . . .     .    6
  3.2 Recommendations spatial placement .            .    6
      3.2.1 Spatial as in ‘scale’ . . . . . . .      .    6
      3.2.2 Spatial as in ‘type’ . . . . . . .       .    6
      3.2.3 Spatial as in ‘surface’ . . . . . .      .    6
  3.3 Recommendations temporal placement             .    7
      3.3.1 Placing Strategy . . . . . . . .         .    7
      3.3.2 Measure Validation . . . . . . .         .    7
  3.4 Recommendations outside the store . .          .    8
      3.4.1 E-mail . . . . . . . . . . . . . .       .    8
  3.5 Mobile - Suggestions on the Road . . .         .    8

4 Metrics; Quantify the Success of the           Sys-
  tem                                                     8
  4.1 OCR - Order Conversion Rate . . . .        .   .    9
  4.2 RpV - Revenue per Visit . . . . . . .      .   .    9
  4.3 Page Visits . . . . . . . . . . . . . .    .   .    9
  4.4 AOV - Average Order Value . . . . .        .   .   10
  4.5 Customer return . . . . . . . . . . .      .   .   10
  4.6 Time spent on store . . . . . . . . .      .   .   10
  4.7 Cart abandonment . . . . . . . . . .       .   .   10
  4.8 Up-selling rate . . . . . . . . . . . .    .   .   10
  4.9 Cross-selling rate . . . . . . . . . . .   .   .   10
  4.10 RoI - Return of Investment . . . . .      .   .   11

5 Implementation                                         11
  5.1 Making a profit . . . . . . . . . . .   .   .   .   11
      5.1.1 Software Product . . . . . .     .   .   .   11
      5.1.2 In-House . . . . . . . . . . .   .   .   .   11
      5.1.3 Software as a Service (SaaS)     .   .   .   11
      5.1.4 Open Source . . . . . . . . .    .   .   .   12




                                                              13

More Related Content

Similar to Building a Recommendation Platform For E-Commerce Businesses Based on Hybrid Content Based - Collaborative Filtering And Web 2.0 Concepts (20)

PPTX
Enhancing the User Experience
Muhammad Sajid
 
PDF
Litmus: Digital Best Practices for Branded Manufacturers
Resource/Ammirati
 
PPTX
Maxv friday
Stanford University
 
PPT
Shopping
Udaydeva1212
 
PDF
Valtech - Behavioral Targeting
Valtech
 
PDF
The Imperfect Store
Bournemouth and Poole College
 
PDF
Channel design decisions mba notes world
Shah Nawaz Ansari
 
PDF
Innovative solutions to respond better to the smarter consumer
Jerry J. Stam
 
KEY
Future of Retail
Dominika Tomek
 
DOCX
[[Srs]] online shopping website for TYBSC IT
YogeshDhamke2
 
PDF
“Electronic Shopping Website with Recommendation System”
IRJET Journal
 
PDF
Litmus: The Digital Shopping Experience
Resource/Ammirati
 
PPTX
Grainger ux
andrewlcampbell
 
PDF
Essence of Retail e-Commerce and its Optimization Webinar
Embitel Technologies - A VOLKSWAGEN GROUP COMPANY
 
PDF
Social crm the new frontier of marketing, sales and service (accenture)
Marie_Estager
 
PDF
Next generation e commerce tools for retailers
Kaizenlogcom
 
PDF
Business intelligence for n=1 analytics using hybrid intelligent system approach
iken Solutions - Web Space-
 
PPT
3rd Portfolio Development Idea: E commerce
Qian Rong
 
PPTX
Digital Strategy, Simplicity through Complexity
Insight Summit Series
 
PDF
IRJET- E-commerce Recommendation System
IRJET Journal
 
Enhancing the User Experience
Muhammad Sajid
 
Litmus: Digital Best Practices for Branded Manufacturers
Resource/Ammirati
 
Maxv friday
Stanford University
 
Shopping
Udaydeva1212
 
Valtech - Behavioral Targeting
Valtech
 
The Imperfect Store
Bournemouth and Poole College
 
Channel design decisions mba notes world
Shah Nawaz Ansari
 
Innovative solutions to respond better to the smarter consumer
Jerry J. Stam
 
Future of Retail
Dominika Tomek
 
[[Srs]] online shopping website for TYBSC IT
YogeshDhamke2
 
“Electronic Shopping Website with Recommendation System”
IRJET Journal
 
Litmus: The Digital Shopping Experience
Resource/Ammirati
 
Grainger ux
andrewlcampbell
 
Essence of Retail e-Commerce and its Optimization Webinar
Embitel Technologies - A VOLKSWAGEN GROUP COMPANY
 
Social crm the new frontier of marketing, sales and service (accenture)
Marie_Estager
 
Next generation e commerce tools for retailers
Kaizenlogcom
 
Business intelligence for n=1 analytics using hybrid intelligent system approach
iken Solutions - Web Space-
 
3rd Portfolio Development Idea: E commerce
Qian Rong
 
Digital Strategy, Simplicity through Complexity
Insight Summit Series
 
IRJET- E-commerce Recommendation System
IRJET Journal
 

Recently uploaded (20)

PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
Designing Production-Ready AI Agents
Kunal Rai
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Advancing WebDriver BiDi support in WebKit
Igalia
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Designing Production-Ready AI Agents
Kunal Rai
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Advancing WebDriver BiDi support in WebKit
Igalia
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Ad

Building a Recommendation Platform For E-Commerce Businesses Based on Hybrid Content Based - Collaborative Filtering And Web 2.0 Concepts

  • 1. Building a Recommendation Platform For E-Commerce Businesses Based on Hybrid Content Based - Collaborative Filtering And Web 2.0 Concepts Techical University of Crete Department of Production Engineering and Management Andreopoulos, Marios Mandourarakis, Ioannis [email protected] [email protected] Department of Electronic & Computer Engineering Department of Electronic & Computer Engineering September, 2012 Abstract We propose an automatic recommendation platform for e-commerce businesses based on multi- criteria analysis and hybrid content-based - collaborative filtering, tailored to e-shops and cross-product category analysis, build with Web 2.0 concepts in mind. We are going to describe the input variables as well as methods to obtain them without frustrating customers, possible ways to pre-process the data and create a common query interface, suggest the recommendations output across a variety of scenarios and finally discuss metrics that can be used to measure the real return of the system. Thanks A good recommendation system can increase a business’ up-selling rate and build stronger customer relations. We would like to thank Professor Nikolaos Matsatsinis The latter point should not be overlooked. The system and Marina Karabatsa for their guidance. will make the customer spend more time on the e-shop and promote frequent visits. We like to think of this as customer - business “bonding” time. Introduction In the next section we write about our proposed data sources - signals and how to acquire them, taking ad- vantage of the “social” conception which dominates the Internet commerce has seen a tremendous boost over world wide web nowadays. Then we proceed to sug- the last decade. So much that for some products it gest pre-processing methods for the gathered data in rivals brick and mortar shops. order to create a generic query interface upon which we From marketing’s perspective the web can be seen as can build recommendations. Recommendations are the holy grail; access to millions of customers, many eager main topic in our third section. Since our system sup- to give feedback, as well as vast quantities of customer ports many types of filtering and can do cross-product behaviour data gathered automatically by web servers and cross-rating suggestions, which ones, when and how and databases. A real life playground for marketing sci- should we present to a user? The fourth section deals entists to apply and test theories. But with great data with metrics and how we can verify that the system sets and user base, comes the need for great questions works toward the business’ benefit. The last section and applications. explores implementation’s details and business models that can be used to build and support such a platform. Our system is designed for the new generation of cus- tomers, people who are now in their early 30’s or younger and for companies that plan to stay relevant for years to come. Recommendations’ quality will grow in parallel with the business - customer relation. This work was done as part of an undergraduate course on Small Business and Entrepreneurship. 1
  • 2. 1 Data Sources - Signals Most people are ok and even expect to give this infor- mation since we live in the age of social. Of course The signals we have in our disposal may come from a there should always be a well visible option to skip the variety of sources: process and proceed to the shop. • Traditional web means, like web server logs, cook- ies and transactions/sales journal. 1.1.3 Products Speaking for Themselves • A customer’s basic profile since in e-commerce anyone has to register a profile. These include at- Products for an e-commerce platform are entries to a tributes such as sex, age, geographical location. database. By using separate standardized fields for each of the characteristics we can easily perform content- • Products’ data, like specifications and product cat- based suggestions. A classification system is also of egory. use, to assign products to categories and subcategories. • User feedback which may be ratings and reviews through traditional means (like forms or star rat- ing) or modern tools (like games and social fea- tures). 1.2 User Feedback; Earn it, Shape it for Multi-Criteria Analysis The 1st and 3rd data sets are considered standard. The 2nd is standard but may be extended through social User feedback is a broad concept. It includes every features. The 4th one we have to design it ourselves action a user may do that isn’t directly related to the from ground up and is the main feature of this section. shopping process. The most common feedback forms are ratings and com- 1.1 Standard Data Sources ments. These days likes on facebook, tweets on twitter and +1s on Google Plus are also common. But there 1.1.1 Web Server, Cookies & Database are so much more we can do to get feedback from our customers. Logs For the virtual world to function, logs are essential. These logs, collected and used primarily for technical 1.2.1 Generic Criteria - Appealing to Cus- reasons, can also be of assist to marketing strategies. tomers’ Values’ System From these logs we can extract information like which pages a certain customer visited, how much time he Our proposal is the use of generic criteria so that we can spend on each one, if he reached a page following a link create a profile for the customer which doesn’t rely on from another page or if he clicked any links on this page. specific product attributes. Since a customer may ex- From our database we have access to things like which pose different weights and thresholds for a criterion ac- products a customer has bought, his orders’ values, cording to a products’ category, we may adjust weights when he placed his orders (time and date), what items and thresholds according to category. he bought together. The proposed criteria are: A smart system could learn from these data a cus- tomer’s shopping habbits and use it for suggestions. • price • quality 1.1.2 Customer’s Profile - Personalization • utility In order to use an e-shop, someone usually needs an • ease of use account. This alone enables us to use all the other • coolness factor features we write about. But also, it gives us more data to process. • safety / trust Typical registration data are name, address, age and e-mail. E-mail is a handy but dangerous way to com- Each of these criteria may be given a different name for municate with customers. We will talk about it at the different products but it actually refers to a customer’s 3rd section. values’ system. Besides these typical data, there are more information An example would be the “coolness factor”. An iPhone we can ask from a customer upon his registration, like may be a product with high coolness factor, but also a to choose product categories that interest him most and fresh vegetable or a vitamin in our age of “fitness” may what he thinks as an acceptable amount of e-mail. be an item with high coolness factor. 2
  • 3. 1.3 Basics: Ratings & Reviews Also a social system implemented into the store could be of high value. For example allow user profiles, simple Ratings and review are considered standard these days. walls, public favorite products, wish lists. Ratings is the classic star system which usually takes values from 1 to 5 stars and no stars equal to no rating. 1.3.3 Gamify the Store The ratings’ titles the customer will see, don’t have to be the names of the criteria we described previously and Another hype of modern web design is gamification. may vary from product to product. The process of applying game techniques to your prod- Reviews can be scanned using automatic text analysis uct (store in our case) to encourage users to engage methods to extract useful information. We may be even more. able to omit some criteria from ratings (six rating areas A basic example would be a personalized question. could be too much for customer) and try to extract Let’s assume you need a bit more data to calculate a them from comments. weak order. You can insert a question created auto- It is important to encourage ratings and reviews. The matically and personalized for the user on a sidebar. As most great example of user feedback is Amazon. the user browse through your store, this simple ques- tion, e.g. do you prefer product A or product B, appears in an unobtrusive way. Most people would give in and 1.3.1 Incentives click the answer. If another answer popped up, they may continued to play for a bit more before returning Giving people incentives to interact with your site and to their shopping. give you information is crucial. Game design is a field of its own, but is an interesting You can appeal to many individuals who value being topic with many practical applications here. of statue by implementing a user rating system, where people with many useful reviews earn titles like expert user. 2 Pre-Processing the Data Another incentive would be offers. Like giving a 1% off of the next purchase for any customer that leaves 10 This is the most challenging part of the process. Al- ratings. though the implementation is left for the engineering It is important though to distinguish between feedback team, we may give some guidelines to assist to a suc- for an item the customer has bought and an item he cessful result. hasn’t actually bought from our store. Because of the diversity of possible recommendation Other incentives may include frequent contests. For ex- scenarios, it is fundamental that the processing part cre- ample a contest could be help us find the best mobile ates such an application programming interface and/or phone and it may become yours. We would then create query interface that it will make it easy for the software a short questionnaire with questions that can help our engineers and administrators of an e-commerce business marketing decisions. If you think about it, the cost of to connect it to their platform. such a feat is small compared to other known methods Figure 1 displays the structure we propose for the rec- of achieving the same results. You have to remember ommender platform and connects the previous section though that the questionnaires should really be kept (input) with the current one. sort. The challenge is that the customer should com- plete the contest before it become tiresome for him. 2.1 Content-based Filtering 1.3.2 Social In this type of filtering, we try to make suggestions based on the characteristics of a product and/or the The hype these days is social. Everything has to be ratings of the individual we make recommendations social, users expect everything to be social. The re- for. This approach at its simplest form uses weights views and user ratings system we proposed earlier are to find recommendation candidates whereas more ad- examples of social features. But one can move it even vanced scenarios include machine learning techniques. further. It is a complex process as we have to teach the machine A very simple way would be to incorporate social but- to search for relations across various characteristics. tons such as the renown Like button of Facebook, the +1 of Google and the tweet button of Twitter. Every time a user shares an item, we can give it a leverage in 2.1.1 Using Characteristics the weak order of its category. If the user comments with the share, we can perform automatic text analysis This is the easier form for this type of filtering yet still to the comment. needs a fair amount of work. Assuming standardized 3
  • 4. We have defined six criteria that will be used to identify products and explained that per category vectors will adjust for weight and thresholds of each criterion for a certain product. The problem is that because of the nature of the data collection process, we won’t have all criteria for any product by each customer. We only get bits of informa- tion from our customers. Our data looks like a puzzle that statistics will help us solve. This isn’t multi-criteria decision analysis (MCDA) as used in marketing strategies, it is MCDA per customer and have much greater error tolerance. Let’s assume our store contains n products that cre- ate the product space N . We have six criteria: a1 , a2 , a3 , a4 , a5 , a6 . For each criterion we have much less data than n. For example, for criterion a1 we may Figure 1: recommender platform and input have k1 values, for criterion a2 , k2 values and so on. We can calculate criteria’ scores adjusting for missing values. fields for product characteristics, we can find and sug- Respectively, a customer may have rated a few items gest products of the same category with similar char- but omitted some ratings for each. We will calculate acteristics. An example would be a visitor looking at a his rating profile from these values. hard disk with 2TB of space, a SATA connection and cost around 100€. The system would proceed to find We propose this process to include a secondary weight disks with similar attributes and suggest them. vector for different product categories. To calculate this we will need items that the customer has both bought A problem in the aforementioned technique is that a and rated. product has much more characteristics than the ones that are important for a customer. For example hard MCDA also requires a weak preference order. This, disks have attributes like rotational speed and access as described in the previous section, can be acquired time. There are two solutions for this problem. by asking the user questions using various games. For example if a user has rated two items, an mini-poll be- The simplest one would be to hard-code the important tween these too could appear on a sidebar. characteristics of a category. A human operator will set the characteristics a customer is more like to find The suggestions of this module of our platform will important. A more difficult approach would be to have mainly be for items that match a user’s rating profile. the system learn through visitors’ page views and orders Also the pre-processing that happens here is shared be- which characteristics are of importance for a category. tween the two filtering modules as we will see in the next paragraphs. This can be on a global level, so we find these charac- teristics that are more probable to be of value for any customer. We would expect a fairly good experience for 2.2 Collaborative Filtering most customers with this approach. Another way would be to make it on a personal and This type of filtering uses customer data to make recom- temporal level. We make the first suggestion to the mendations. It tries to find persons with similar tastes customer using global data and based on his choice, we as the current visitor and then proceeds to suggest items adapt our weights to make suggestions based on similar that they (people with similar taste) like. characteristics of the last viewed items. This technique This is easier than content based filtering in the sense would be better for people who like to do more research that the system doesn’t need to understand the under- before they buy, so there could be a trigger for such a lying data, like product characteristics, but only to find technique based on the customers’ page views profile. persons with similar taste. Also it works well for cross- category suggestions. On the other hand it requires better hardware as it operates on big datasets, it needs 2.1.2 Ratings and Multi-Criteria Analysis a large amount of data before it can deliver good sug- gestions and requires customers with a decent history Ratings are the center of content based filtering. They of purchases and/or ratings. can be used to suggest goods that a customer isn’t Generally we can divide the process in two distinct searching for. phases. First we have to find a group of users with Multi-Criteria Analysis can be used as a processing similar taste. This may be accomplished through clus- method for this part of the Content-based Filtering. tering methods or nearest neighbor search. Then we 4
  • 5. have to make suggestions based on the choices of the 2.3.1 No-Knobs Approach group. For a traditional approach (known as memory based) we could use the rating data of each user in the A no-knobs approach to the Query Interface could be group applying weights for user taste compatibility. A further studied. In this approach the platform acts more complex approach (model based) would require in an automated fashion, auto-tuning its results. We data mining and machine learning techniques to find would advise against it as a primary goal. The platform patterns in the underlying data. should be implemented with manual tuning. Then, as it gathers more data, one could try to run in parallel a no-knobs implementation of the Query Interface and 2.2.1 Multi-Criteria Analysis compare the results. Multi-Criteria Decision Analysis was described before but in Collaborative Filtering it can give even better 3 Recommendations System results. Besides, in marketing, MCDA is used not only to profile products but to find market segments. The recommendations (or recommender) systems are MCDA actually follows the work-flow we described for becoming increasingly popular for the last 4 years (2008 generic Collaborative Filtering. It has four phases: to 2012). This is mainly due to the huge penetra- tion that mobile devices (smart phones, tablets) and • Data acquisition which we described in section 1. internet services (social media) have introduced to the consumer market. Some of the most well-put recom- • User modeling which we described in Content mendations until now are the product suggestions of Based Filtering. amazon.com, the song suggestions of Bang & Olufsen, • Clustering, the 1st part of Collaborative Filtering. iTunes and Pandora Radio, the movies and video sug- gestions of Netflix and YouTube, the search suggestions • Recommendation Phase, the 2nd part of Collabo- of Google Trends, the photo suggestions of Pinterest, rative Filtering. the network connection suggestions of Facebook and Linkedin. 2.3 Building an API - Query Inter- As expected, the recommendation systems are slowly gaining ground in all current online activities that can face both directly and indirectly opose a marketing share opportunity. A good recommendation platform is tailored to each product -or one level up: product’s category for a cer- There are numerous of algorithms that have been im- tain store. Thus the people who build and maintain plemented in the design of recommendation systems. the electronic commerce store should adjust the recom- One of the most commonly used algorithms in recom- mendation platform to their needs. Hence the need for mender systems is the k-nearest neighborhood (k-NN) an application programming interface including a query approach. k-NN is a method for classifying objects interface, easy to build a GUI around it. based on the properties of its closest neighbors in the feature space. In k-NN, an object is classified through The API should offer a handy way to feed data to the a majority vote of its neighbors, with the object being system, perform queries and accept results from it. It assigned to the class most common amongst its k near- should be well defined and easily accessible from current est neighbors (k is a positive integer, typically small). web technologies. It is an important design task for the If k = 1, then the object is simply assigned to the class engineering team. of its nearest neighbor. The Query Interface is a mean to abstract the com- Another popular algorithm is the Pearson Correlation, plexity of the platform from the people who run the e- which can be proved especially useful as we can see in commerce store. An example of a query interface would section 4 of this paper. Pearson Correlation is a mea- be the ability to instruct the platform to return 5 sug- sure of the (linear) dependence between two variables gestions, of which 3 will come from the content based X and Y, giving a value in the range of +1 and −1 module and 2 from the collaborative module through a inclusive. In a social network, a particular user’s neigh- simple query. So in essence it is a standardized input borhood with similar taste or interest can be found by of short queries that translate to a subset (albeit the calculating this coefficient. The user’s preference can most important ones) of the total queries our platform be predicted by collecting the preference data of the can do. The Query Interface is part of the API. top-N nearest neighbors of a particular user (weighted by similarity). If the recommender system is built in-house, then the API can be omitted but it is advisable not to, since the One last well-known method, which can also be proved implementation team will probably differ from the team useful in section 4, is the Rocchio Classification. This maintaining the store. It can be of a less advanced form is a method of relevance feedback dating back to the though. 1970s. Rocchio makes use of the Vector Space Model 5
  • 6. and is based on the assumption that most users have The key question is what happens to the relationships a general conception of which items should be denoted between smaller groups of people around the magnitude as relevant or non-relevant. User feedback is used to of a handful of hundreds? Do these relationships exist refine a search query by emphasizing or deemphasiz- and in what extent do these people influence each other? ing certain terms (similar to how Pandora refines its Eric T.Bradlow et.al. (2005) showed that this maping user recommendations). Through feedback, the user’s can be achieved and also can produce some very satisfy- search query is revised to include an arbitrary percent- ing results. Around the same period many studies were age of relevant and non-relevant terms as a means of based on the same concept. They verified that the spa- increasing the search engine’s recall, and possibly the tial models can be successfully applied and also generate precision as well. The number of relevant and non- some very satisfying estimations about the consuming relevant terms allowed to enter a query is dictated by a actions of such groups. series of weights in the central equation. 3.2.2 Spatial as in ‘type’ 3.1 Hybrid Filtering Hybrid filtering suggests the combination of collabora- Similar studies carry on until today, progressing with tive and content-based filtering, especially if it proves some really intriguing results about the application of to be more effective, depending on the case. A quick re- the special models in social media. Just recently on- minder: content based filtering is being produced based line services like Myspace, Facebook, Twitter, Linkein on information that log the customer’s profile and taste and Google+ came to play and advertisers advanced while collaborative filtering is being produced by match- their way by making the best out of this knowledge. ing the habits that arise between users that seem to Nowadays the interactive banners, the tag clouds and belong to the same group. the infographics produce beautiful aesthetics which lure the individual ‘to blend in’, ‘to belong’ and ’to follow’ Hybrid approaches can be implemented in several the trends that his friends, colleagues and associates are ways: by making content-based and collaborative-based up to. predictions separately and then combining them; by adding content-based capabilities to a collaborative- In advertising, spatial and temporal placement can ac- based approach (and vice versa); or by unifying the ap- tually become very complex. The research field of each proaches into one model for a complete review of recom- can sum up a completely separate sector in the science mender systems. Several studies empirically compare of marketing. This is because there are a lot of pa- the performance of the hybrid with the pure collabora- rameters to be considered, like for example the special tive and content-based methods and demonstrate that maping of each individual banner (or recommendation) the hybrid methods can provide more accurate recom- according to geographical, demographical and psycho- mendations than pure approaches. These methods can metric criteria. Another concern is the analysis of the also be used to overcome some of the common problems spatial drift phenomenon which has to do with the ex- in recommender systems such as cold start (not enough ponential decrement of an action’s effect (i.e. gener- gathered information to generalize) and the sparsity ation of a specific recommendation) around a ’neigh- problem (having too few ratings and hence too few cor- bourhood’ whether this is considered to be in space or relations between users). time. 3.2 Recommendations spatial place- 3.2.3 Spatial as in ‘surface’ ment As far as it concerns the spatial placement of the web- 3.2.1 Spatial as in ‘scale’ store’s layout, the corresponding ‘social media plug-ins’ are most commonly being placed around the page’s The key assumption in the traditional marketing liter- edges. This is due to the fact that they can easily ature is that the consuming behavior of an individual fill in the space between two seemingly irrelevant ad- is conditionally independent of the consuming behav- vertisement banners and attract the attention of the ior of another individual. This means that the decision reader to the surrounding area. The recommendation making process of a sole customer is considered to be systems usually present information that is being au- unaffected by the decision making process of another tomatically refreshed but always semantically attached customer, let alone a group of customers. to the consuming preferences of the customer’s network We know that the latter simplification can be very use- connections. This way they attempt to maximize the ful in some special cases but this requires extra caution chances of him indulging or interacting with the adver- when applied in general. So, when researches study tisement. The same plug-ins are usually implemented large groups of people they tend to adopt theories that in a way that they can learn and improve their sugges- stand to the concept of ‘sole identity’ assigning a per- tions according to the customers behavior and use this sonal human-like behavior to a crowd. information as a feedback to his connections too. 6
  • 7. 3.3 Recommendations temporal indirect one which will attempt to persuade him about placement the company’s current concern or its vision about the future. Whatever we do, however we do it, regardless of how Nowadays similar techniques are being used by many important it may seem at the moment, it will actu- firms and provide results which solely depend on the ally be quite useless or insignificant if attempted in the fashion making abilities of their respective departments. wrong time. So, timing is of great essence and this ap- Big brand-names which hold a solid corporate identity plies for every action we take in life. Likewise, every often produce content like this to promote their innova- temporal placement of a campaign, any recommenda- tive services and products and at the same time to avert tion technique, if properly designed, can produce great a competitor from impinging to the same one. Smaller results, but in any other case may have no effect or brands do the same in order to assiduously make their even worse, the opposite of the one that we desire and way in a niche market share. expect. Time, unlike space, is a dimension that opposes the same limits to every marketer. Expertise is always con- 3.3.2 Measure Validation sidered an asset, but in the field of time, all ‘players’ share about the same chances of success. No-one can Marketing measures must efficiently estimate the per- establish an advantage in the field of time regardless formance of a particular strategy. The indexes used in of how huge or small brand name he claims to own. such cases can either have some physical meaning or And the same goes for the recommendation systems not, its doesn’t really matter as long as marketers are too. Conquering in the field of time demands good able to understand how they work and why. Two com- strategy skills and an intuitive ability to foresee the mon scientific criteria for the meaning of measures are market trends. reliability and validity. Researchers have proposed a lot of versatile ideas about Reliability can be verified when testing the correlation the formulation of relevant mathematical models which between the measure taken at different times (retest predict such fashions and in some extent they generate stability) with equivalent forms or with split halves (in- some good results. But machine learning techniques, ternal consistency). artificial intelligence, fuzzy logic and neural networks, even when combined all together or used extensively in Validity on the other hand some many forms. There is simple cases, can (until now that is) not reproduce the face or consensus validity which exists when a measure results that a talented visionary human may have, just looks as if it should indicate a particular variable or because successfully forming the future will always be concept. Using this form is not safe because studies more rewarding than managing to predict it. have shown that scores on recognition measures can be influenced by irrelevant response set. 3.3.1 Placing Strategy Another, more objective form of validity is predictive or concurrent validity. Predictive validation procedures By taking into consideration all of the above we con- consist of determining the extent to which particular clude that an automated recommendation system must: measures predict other ‘criterion’ measures, so it has much pragmatic meaning in marketing. This measure • focus on small time frames and make quick but validation exhibits some pros and cons. small steps rather than slow and big ones in order to keep all options open until the very last minute When the predictions based on the latter form are be- of commission (pivoting), having too fuzzy the certainty of the measuring results can be enhanced by using the measure validation which • exhibit preference in aborting if the recommenda- consists of two parts: convergent and discriminant val- tion has a high risk of being unwelcome in the des- idation. The first is synonymous with predictive or ignated time of prompting (because you can never concurrent validation, which means that a measure can be too late as trends cycle over times, but you can adequately represent a variable if it correlates or ‘con- be too quick!) and verges’ with other supposed measures. Discriminant • try to lead the market by rather shaping the con- validation is absolutely necessary to really pin down the sumers’ opinion (branding) rather than passively meaning of measures. This is because a measure may awaiting for a feedback on about what are they converge with measures of other variables in addition about to wish/like. A recommendation system can to the one of interest. achieve that by enriching the conventional content with injected information (impression) that will be Finally, construct validation can only be considered intentionally kept out of focus. after measure validation is established. This valida- tion actually provides a proof of concept as it checks So, the subject will hopefully be the recipient of two whether a hypothetical construct, composed of several messages, a straightforward one, which will literally en- similar variables, actually operates in the scientifically courage him to try a special product or a service, and an expected way. 7
  • 8. 3.4 Recommendations outside the and you should always ask for a customer’s approval store before using them. But mobile apps can also be used to gather feedback Internet and mobile platforms offer many ways to reach from customers. Smart-phone games are popular these to your customers outside of your e-commerce site. days; why not build one that also does a bit of market Let’s explore two of the most popular. research? As good as all these seem though, the mobile sector is managed mainly by advertisers, so you may have to 3.4.1 E-mail work with them. E-mail is a dangerous field. It is absolutely a love or hate factor to your customer relations. Do it wrong and you will damage your image, do it right and you will 4 Metrics; Quantify the Suc- see frequent visits to your business. A middle ground cess of the System doesn’t exist. We suggest that email should be used for sending rec- Data mining techniques can be proven of great essence ommendations to the customers but with the utmost if they are used properly in systems like our proposed care. Recommendations send through email, should marketing platform. They hold the strength to reveal contain a mix of suggestions relevant to items the cus- well-hidden social habits, market trends, customer pro- tomer saw during his last visits and suggestions that file characteristics and other natural patterns which in are of his taste. the general case tend to be ignored or undervalued. They can also make use of seemingly arbitrary data, The frequency of e-mails should be related to the fre- given as an input and produce some very interesting quency of visits the customer pays to the shop. A estimations in their output. less interested customer will probably be offended by frequent email. Expected etiquette for communication But the output itself needs further studying by an ex- between a business and an individual suggests is to be pert, usually a human, in order to be interpreted and reasoned. Contact the customer only if the business accompanied by a logical reasoning. Also, even if pro- have some offers on items he may be interested or there cessed automatically, the expert must provide a well- are new arrivals of items he may be highly interested. pointed definition of their application domain in order to lessen the fuzziness that usually dominates around the theoretical models of a respective AI expert-decision 3.5 Mobile - Suggestions on the system. Road So, the intervention of a human expert in the process although undesirable is usually encouraged because the Every day smart-phones and other web-enabled devices proper solution of an automated artificial intelligence make their way into customers’ pockets by the millions. machine cannot always warranty a satisfying result. Erick Schmidt of Google on September 5th, 2012, an- This is due to the fact that nature’s patterns, let alone nounced that Android devices alone have over 1.3 mil- laws, are formed in a non-deterministic and non-linear lion activations per day! way and the solutions produced are only a fair estima- tion of the truth. A significant part of e-commerce will be transferred to the mobile in the coming years and a whole new set of Scientists know that the complexity of such problems possibilities will open. There are already applications can be reduced by linearization techniques. In our pro- that take advantage of the mobile, though not widely posed platform we do the same by modeling a good deployed nor accepted as much. linear substitute, which, in other words, means that we build a system which uses its linear inputs to ‘sense’ E-commerce businesses may use the mobile through mo- some quantities (here the market’s trends) and learn bile web sites and applications. But this use doesn’t how to adapt its outputs accordingly (here the cus- differ much from the typical web use, so everything we tomers’ recommendations) to maximize the profit. said until now applies. To accurately identify the dominant factors that rule The game-changing feature of smart-phones is that peo- the market’s trends and quantify their importance in ple are constantly connected, have their device always each scenario, we use special indexes called metrics or at hand and are frequently tracked. For example a cus- KPI (key performance indicators). The enumeration tomer’s location can cue us as to when to send rec- of the existing KPI is a difficult job since it is almost ommendations to his device. If we know his location impossible to gather information on already working on block level, we can detect when he is out shopping systems but most companies tend to use some of the and in some cases what he is shopping. If we can pin- metrics already made known by the international bib- point his location to a few meters, we may even know at liography. Companies tend to gather a lot of data (i.e. which shop he is. Such techniques may bring criticism by market research using questionnaires) or use many 8
  • 9. different KPI in order to successfully detect the pulse This feedback strategy ensures that the system will of the market in any given time desirable (i.e. in order eventually and concurrently blend in to the market to apply a promising marketing campaign, change their needs by inheriting its trends. This method is similar brand orientation, use pivoting, etc). to an automated multi-criteria analysis and promises to produce fast and traceable results with increasing But neither the gathering of a high volume of data or marketing efficiency. the application of a well-guessed selection of the corre- sponding KPI can guaranty the success of the applied In the subsections to follow we appose some of the most decision. High volume of data is source consuming and promising KPI which are frequently used by companies impractically optimistic while the random selection of and organizations. KPI itself defies the very nature of non-determinism in non-linear and complex problems. This is because an observer can only look for logical and simple explana- tions of why things work in a specific way, lucking the 4.1 OCR - Order Conversion Rate ability, or even worse neglecting the comprehension, to realize and/or understand the existing correlations be- tween all the factors that balance a phenomenon to a Order conversion rate happens to be one the prime most given state. important values of KPI. Generally a conversion rate refers to the percentage of events leading to another Also, statistics in general are doomed to become useless event. In e-commerce it usually refers to the percentage if the policies involved around the data aggregation and of visits that convert into orders. Most web analysts process cannot generate coherent results that will lead define the conversion rate as the percent of site visitors to a straightforward and cost efficient strategy. Unfor- who do something that the company wants them to do, tunately, the techniques usually inherited by medium- like submit an order, sign up for an email or send a scale companies are being efficient but there is no safe- share link and so on. way to ensure that they are also the optimum, so they sometimes can exhibit significant shortcomings. The order conversion rate is an important metric but If we may address some of the main reasons for this, we it just shows how the minority did something but not believe that its due to the fact that a suitable KPI is ap- why the majority did not do anything. This is because plied in the wrong context or used in the wrong extent it is not always the main aim of visitors to go to a and because the system is forced to process and inter- homepage to buy something. Customers also intend to pret high-volume data which are bringing back fuzzy do research, look for information, read a blog or just and thus untrustworthy results. need advice. So OCR and similar CR are useful but only if combined with other KPIs too. To counter that, our proposed scheme tries to check which metrics (KPI) are highly correlated with the desired outcome (overall profit) and assigns special ‘weighting factors’ to each one of them in relevance to 4.2 RpV - Revenue per Visit their estimated correlation. The idea is to use the col- lective information of many different metrics in a ver- Average revenue per visit is a well-known marketing ac- satile way to make them form the optimized ‘success quisition indicator. It is defined simply as the sum of quantifiers’ that best suit the e-commerce business in revenue generated / number of visitors. This metric question. The very first time the ‘weighting factors’ get is being raised when more valuable customers are at- random values but after successive passes the system tracted to the e-commerce business store. will tend to settle to a specific state. We achieve that by inducing a dedicated feedback sys- tem (an example would be a system based on genetic algorithms). This measures the system’s efficiency and 4.3 Page Visits adjusts the relevant importance of each metric accord- ingly. Each weighting factor represents the metric’s Page views is a metric mainly used to quantify the ef- significance towards the desirable effect, which is to in- fect that a marketing campaign has on a specific target crease the profit without compromising (not a lot at group and can produce some nice conclusions especially least) the rest of the metrics weighting effect. when studying the temporal and spatial allocation of If the metric in question proves to be important then the channels that the customer traffic passed through it makes sense to focus more effort on its fine-tuning, until it arrived to a specific page on the web-store. Page so its highly-rated. In any other case the metric’s re- visits comprise a very important index for the system we sults must be considered stochastic and thus its influ- propose. It can distinguish the extent in which our vis- ence must be semantically regarded as untrustworthy, itors are motivated to visit the web-store due to some so in every pass, the system assigns a small value to social media campaign, quiz, etc and when combined it, and subsequently this becomes small enough to be with the RpV it can show the clout that each social ignored. media poses upon our visitors over time. 9
  • 10. 4.4 AOV - Average Order Value For example one could use the cart just to get a pic- ture of the ‘total cost’ of his wishing list, although he Average order value is considered a key performance never meant to buy the products this instance. Also, a indicator when combined with revenue per visit and customer may need to use the cart just to collect some order conversion rate. The basic calculation is Average of his favorite stuff in one place, due to the luck of a Order Value = Sum of Revenue Generated / Number ‘favorite’ or ‘share’ button. He may also do the same of Orders Taken. thing if he wishes to use the links of two or more similar products he wishes to compare but the site lucks such Usually experts work diligently on both OCR and AOV a service. A low percentage of visitors who purchase an in order to improve both at the same time by seg- item after adding it to the cart could also may mean menting visitors and marketing campaigns into high, that your shipping rates are high or that your checkout medium and low AOV groups. This can help iden- template is confusing. tify the special marketing approach that the company should apply on each group in order to get the best So, the ‘cart abandonment’ results must always be ana- collective outcome. lyzed in accordance to criteria that are stripped off such scenarios or take them into consideration by proper nor- Managing the balance between OCR and AOV can be malizing factors. And one of the aims of the proposed a tricky part. A small increase to one could mean the platform is to achieve this normalizing efficiently and drop of the other and if this drop is of great magnitude automatically. compared to the rise then the revenue per visitor may be strongly impacted. 4.8 Up-selling rate 4.5 Customer return The up-selling term indicates the percentage of success- ful ’hits’ on a recommended order of a product similar This is one very important metric because it signifies to the one that the customer just ordered or intended the amount of dedication that the target group shows to order. Up-selling is considered to be successful if it to the business. This group is the one that consciously leads to a more profitable sale without jeopardizing the supports in both direct (buys) and indirect (personal good relationship that the organization has built with recommendations to friends) ways the company. the customer. So, to produce a feeling of comfort to the customer, the offer must be very well-pointed by Again, this KPI can provide valuable feedback to the providing a recommendation that enhances the value expert decision system especially when combined with he gains from the bargain. An example would be to others. propose the purchase of one more product which is a warranty extension on the hard disk drive that the cus- tomer is willing to buy. 4.6 Time spent on store The similarity between the two products (i.e. war- ranty extension and hard disk drive) will be modeled Time spent on a store can have a lot of different mean- by a real number called ‘relation-strength’. The ‘rela- ings. More time could mean more chances for a cus- tion strength’ can be defined by an expert at first by tomer to indulge himself into an order or it could mean setting the appropriate values in a product-to-product a bad design that makes it hard for the customer to find matrix. When the system becomes operational, its algo- what he needs. In the second case it is mostly guar- rithms fine-tune this matrix by reassigning new values anteed that the ‘customer return’ KPI will get affected to it, as indicated by the real market’s feedback. As and become smaller. So this is where data-mining tech- the cautious reader can notice, this idea is based on a niques come to play and prove to be very handy. If the principle similar to the one we have already described volume of the collected data is sufficient enough (and about the corellation weights between a KPI and the not extremely large) then this metric can reveal very overall profit. useful information about the quality of the site’s design and/or the applied marketing technique at hand. 4.9 Cross-selling rate 4.7 Cart abandonment The cross-selling term is very similar to the up-selling one and there is only one small distinctive difference. If used properly, the metric of ‘cart abandonment’ can It indicates the percentage of successful ‘hits’ on a rec- be a very helpful asset in determining what products ommended order of a product seemingly irrelevant to and categories are driving the most abandonment and the one that the customer just ordered or is about to under which circumstances. But it should be used with order. A valid example could be the proposal of a car extra caution because it can get easily misleading. The charger for a mobile phone to someone who bought a problem does not always lie on the shipping costs or the memory card for his laptop. It seems irrelevant but the billing fashion. system can guess the customer’s financial status as he 10
  • 11. is aged enough, he just bought a laptop, he is male and lower abstraction computer languages that tend to per- he has declared a mobile number so he probably owns form better. To glue all the parts together, we would a car too. The system could also know nothing about use higher abstraction, easier to handle languages. We this customer but could extrapolate information from would make our API’s compatible to industry standard the profile patterns of other customers that fit in the technologies, easy to use and with good documentation, same target group. to help clients connect their e-commerce businesses to our platform. As far as it concerns the implementation of the process- ing of this profiling information, the only thing needed is a medium-scale database (i.e. over 1000 records) and 5.1 Making a profit the algorithmic strategy already stated in the previous section. Implementing the platform won’t come cheap. The pri- mary cost will be the salaries of the engineers needed. 4.10 RoI - Return of Investment 5.1.1 Software Product All the metrics mentioned above could be considered as variables (inputs) which will are being set according The common approach is to release the platform as to the results that are generated by RoI. So, RoI value a software product which businesses can buy and can, apparently be considered as a proposed output and then achieve higher revenue through technical support its value a relevant estimator of the system’s efficiency. and/or software upgrades. For such an approach to work, you have to make your software as compatible In general, return of investment is the amount of profit a and easy to setup and use as possible. This leads to certain policy can produce in relation to the cost of this much higher costs and may be a limiting factor to the policy’s adoption. In the proposed multi-criteria anal- number of features you can implement. Also in this ap- ysis this metric can reflect the company’s total profit proach you should generally try to release a product as change rate in relation to the absolute value of the perfect as possible, since fixing mistakes on customers’ change of a respective KPI. So RoIOCRt1 will be the side isn’t viable. This will make your time schedules value that RoI acquires at the moment t1 when OCR, longer. and only OCR, changes. In the special case where RoIOCRt1 appears to be monotonous in the range we are interested in we can safely make future predictions 5.1.2 In-House according to the graph’s envelope or the projections of RoIOCRt1 trend-line. For a large e-commerce business, it would make sense to hire a team to implement the platform in-house. This approach has the benefits of a tailored solution. It can 5 Implementation be tweaked and adjusted to the company’s needs. But for it to make sense, it would have to give the sales Our proposed recommender platform is a challenge to a boost much bigger than a ready made solution. We implement. It has high complexity and a big volume of would say that the expected profits’ increase on top of necessary features. the profits’ increase a ready made solution would bring, should be such that the implementation should pay for A team of software engineers with a strong background it self in two to three years. in math would be able to bring it to life. Alternatively a team of software engineers with structured thinking and a mathematician with a background in statistics 5.1.3 Software as a Service (SaaS) and algebra would also be able to pull it off. There is much hype around this concept these days. The system should first be carefully designed and then A company would implement and deploy the platform implemented. Because of its complexity and many pos- to its own servers and then customers would pay for sible points of failure, it would be best to partition the access to it (usually in monthly or annually plans). This platform to many small modules with distinct purposes approach from a financial standpoint has the advantage and write some test cases. of a steady revenue. Also, because of the savings on the The technologies that are going to be used should be customers’ side due to lower maintenance costs, he will chosen by the design team. The business model that accept to pay more for our product. will be used may be of importance in this decision. In the implementation front, this approach increases Ourselves, we prefer the SaaS approach (explained the running costs due to the need for servers and an later) and we would proceed in an implementation on administration team. Also it increases your responsi- the cloud that would be highly scalable. Parts of the bility since part of your customers’ business runs on design that are CPU bound we would implement them your servers. On the other hand, the ability to perform using high performance DBMS and/or in-house code in maintenance and upgrades at any time to our software, 11
  • 12. facilitates the software engineering part and let us do in- References cremental enhancements, test features and do rollbacks when needed, thus allowing less demanding times and [1] Nikolaos F. Matsatsinis and Yannis Siskos resulting in a more robust product from the client’s per- MARKEX: An intelligent decision support sys- spective. tem for product development decisions. European Journal of Operational Research Vol. 113, No. 2 (1999), pp. 336-354 Article Stable URL: http: 5.1.4 Open Source //dx.doi.org/10.1016/S0377-2217(98)00220-3 Many software products choose the path of open source, [2] Lakiotaki, Kleanthi and Delias, Pavlos and commonly known as community editions of the prod- Sakkalis, Vangelis and Matsatsinis, Nikolaos User uct. This doesn’t stop a company from selling, renting profiling based on multi-criteria analysis: the role (SaaS) or charging technical support for its platform. of utility functions. Technical University of Crete Decision Support Systems Laboratory University The main advantage is that programmers from all over Campus 73100 Chania Greece Operational Research the world will help you build, test and maintain your Springer Berlin & Heidelberg Vol. 9, No. 1 (2009), product. Most usually you will find that these people pp. 3-16 Article Stable URL: http://dx.doi.org/ are your customers who decide to implement a certain 10.1007/s12351-008-0024-4 functionality themselves. [3] Lakiotaki, K. and Matsatsinis, N.F. and Tsoukià Also administration teams (which will connect their ands, A. Multicriteria User Modeling in Recom- company’s e-shop with your platform) tend to feel more mender Systems. Intelligent Systems, IEEE Vol. 26, safe with open source approaches as they know they can No. 2 (Apr., 2011), pp. 64-76 Article Stable URL: tailor a solution to their needs if needed and that your http://doi.acm.org/10.1109/MIS.2011.33 product’s viability doesn’t depend on your businesses [4] C. Moghrabi and M.S. Eid Modeling users through viability. an expert system and a neural network. Computers & Industrial Engineering Selected Papers from the 22nd ICC and IE Conference Vol. 35, No. 3-4 (1998), pp. 583-586 Article Stable URL: http://doi.acm. org/10.1016/S0360-8352(98)00164-8 [5] Liu, Liwei and Mehandjiev, Nikolay and Xu, Dong- Ling Multi-criteria service recommendation based on user criteria preferences. Proceedings of the fifth ACM conference on Recommender systems - Rec- Sys (2011), pp. 77-84 Article Stable URL: http: //doi.acm.org/10.1145/2043932.2043950 [6] Mehdi Dastani and Nico Jacobs and Catholijn M. Jonker and Jan Treur Modelling User Preferences and Mediating Agents in Electronic Commerce. 1999. [7] Beliakov, Gleb and Calvo, Tomasa and James, Si- mon Aggregation of Preferences in Recommender Systems. School of Information Technology, Deakin University, 221 Burwood Hwy, Burwood, 3125 Australia Springer US (2011), pp. 705-734 Ar- ticle Stable URL: http://dx.doi.org/10.1007/ 978-0-387-85820-3_22 [8] Fabian Abel, Silvia M. Baldiris, Nicola Henze User Modeling, Adaptation and Personalization. Adjunct Proceedings of the 19th International Conference on UMAP, Poster and Demo Track (Jul., 2011) [9] Roger M. Heeler and Michael L. Ray Measure Vali- dation in Marketing. Journal of Marketing Research Vol. 9, No. 4 (Nov., 1972), pp. 361-370 Article Sta- ble URL: http://www.jstor.org/stable/3149297 12
  • 13. Contents Introduction 1 1 Data Sources - Signals 2 1.1 Standard Data Sources . . . . . . . . . . 2 1.1.1 Web Server, Cookies & Database Logs . . . . . . . . . . . . . . . . 2 1.1.2 Customer’s Profile - Personalization 2 1.1.3 Products Speaking for Themselves 2 1.2 User Feedback; Earn it, Shape it for Multi-Criteria Analysis . . . . . . . . . . 2 1.2.1 Generic Criteria - Appealing to Customers’ Values’ System . . . . 2 1.3 Basics: Ratings & Reviews . . . . . . . . 3 1.3.1 Incentives . . . . . . . . . . . . . 3 1.3.2 Social . . . . . . . . . . . . . . . 3 1.3.3 Gamify the Store . . . . . . . . . 3 2 Pre-Processing the Data 3 2.1 Content-based Filtering . . . . . . . . . 3 2.1.1 Using Characteristics . . . . . . . 3 2.1.2 Ratings and Multi-Criteria Analysis 4 2.2 Collaborative Filtering . . . . . . . . . . 4 2.2.1 Multi-Criteria Analysis . . . . . . 5 2.3 Building an API - Query Interface . . . 5 2.3.1 No-Knobs Approach . . . . . . . 5 3 Recommendations System 5 3.1 Hybrid Filtering . . . . . . . . . . . . . . 6 3.2 Recommendations spatial placement . . 6 3.2.1 Spatial as in ‘scale’ . . . . . . . . 6 3.2.2 Spatial as in ‘type’ . . . . . . . . 6 3.2.3 Spatial as in ‘surface’ . . . . . . . 6 3.3 Recommendations temporal placement . 7 3.3.1 Placing Strategy . . . . . . . . . 7 3.3.2 Measure Validation . . . . . . . . 7 3.4 Recommendations outside the store . . . 8 3.4.1 E-mail . . . . . . . . . . . . . . . 8 3.5 Mobile - Suggestions on the Road . . . . 8 4 Metrics; Quantify the Success of the Sys- tem 8 4.1 OCR - Order Conversion Rate . . . . . . 9 4.2 RpV - Revenue per Visit . . . . . . . . . 9 4.3 Page Visits . . . . . . . . . . . . . . . . 9 4.4 AOV - Average Order Value . . . . . . . 10 4.5 Customer return . . . . . . . . . . . . . 10 4.6 Time spent on store . . . . . . . . . . . 10 4.7 Cart abandonment . . . . . . . . . . . . 10 4.8 Up-selling rate . . . . . . . . . . . . . . 10 4.9 Cross-selling rate . . . . . . . . . . . . . 10 4.10 RoI - Return of Investment . . . . . . . 11 5 Implementation 11 5.1 Making a profit . . . . . . . . . . . . . . 11 5.1.1 Software Product . . . . . . . . . 11 5.1.2 In-House . . . . . . . . . . . . . . 11 5.1.3 Software as a Service (SaaS) . . . 11 5.1.4 Open Source . . . . . . . . . . . . 12 13