Building a Recommendation Platform For E-Commerce Businesses Based on Hybrid Content Based - Collaborative Filtering And Web 2.0 Concepts

Building a Recommendation Platform
For E-Commerce Businesses
Based on Hybrid Content Based - Collaborative Filtering
And Web 2.0 Concepts
Techical University of Crete
Department of Production Engineering and Management

Andreopoulos, Marios Mandourarakis, Ioannis
tuc@andmarios.com i.mandourarakis@gmail.com
Department of Electronic & Computer Engineering Department of Electronic & Computer Engineering

September, 2012

Abstract
We propose an automatic recommendation platform for e-commerce businesses based on multi-
criteria analysis and hybrid content-based - collaborative filtering, tailored to e-shops and cross-product
category analysis, build with Web 2.0 concepts in mind. We are going to describe the input variables
as well as methods to obtain them without frustrating customers, possible ways to pre-process the data
and create a common query interface, suggest the recommendations output across a variety of scenarios
and finally discuss metrics that can be used to measure the real return of the system.

Thanks A good recommendation system can increase a business’
up-selling rate and build stronger customer relations.
We would like to thank Professor Nikolaos Matsatsinis The latter point should not be overlooked. The system
and Marina Karabatsa for their guidance. will make the customer spend more time on the e-shop
and promote frequent visits. We like to think of this as
customer - business “bonding” time.

Introduction In the next section we write about our proposed data
sources - signals and how to acquire them, taking ad-
vantage of the “social” conception which dominates the
Internet commerce has seen a tremendous boost over world wide web nowadays. Then we proceed to sug-
the last decade. So much that for some products it gest pre-processing methods for the gathered data in
rivals brick and mortar shops. order to create a generic query interface upon which we
From marketing’s perspective the web can be seen as can build recommendations. Recommendations are the
holy grail; access to millions of customers, many eager main topic in our third section. Since our system sup-
to give feedback, as well as vast quantities of customer ports many types of filtering and can do cross-product
behaviour data gathered automatically by web servers and cross-rating suggestions, which ones, when and how
and databases. A real life playground for marketing sci- should we present to a user? The fourth section deals
entists to apply and test theories. But with great data with metrics and how we can verify that the system
sets and user base, comes the need for great questions works toward the business’ benefit. The last section
and applications. explores implementation’s details and business models
that can be used to build and support such a platform.
Our system is designed for the new generation of cus-
tomers, people who are now in their early 30’s or
younger and for companies that plan to stay relevant
for years to come. Recommendations’ quality will grow
in parallel with the business - customer relation.

This work was done as part of an undergraduate course
on Small Business and Entrepreneurship.

1

1 Data Sources - Signals Most people are ok and even expect to give this infor-
mation since we live in the age of social. Of course
The signals we have in our disposal may come from a there should always be a well visible option to skip the
variety of sources: process and proceed to the shop.

• Traditional web means, like web server logs, cook-
ies and transactions/sales journal. 1.1.3 Products Speaking for Themselves
• A customer’s basic profile since in e-commerce
anyone has to register a profile. These include at- Products for an e-commerce platform are entries to a
tributes such as sex, age, geographical location. database. By using separate standardized fields for each
of the characteristics we can easily perform content-
• Products’ data, like specifications and product cat-
based suggestions. A classification system is also of
egory.
use, to assign products to categories and subcategories.
• User feedback which may be ratings and reviews
through traditional means (like forms or star rat-
ing) or modern tools (like games and social fea-
tures).
1.2 User Feedback; Earn it, Shape it
for Multi-Criteria Analysis
The 1st and 3rd data sets are considered standard. The
2nd is standard but may be extended through social User feedback is a broad concept. It includes every
features. The 4th one we have to design it ourselves action a user may do that isn’t directly related to the
from ground up and is the main feature of this section. shopping process.

The most common feedback forms are ratings and com-
1.1 Standard Data Sources ments. These days likes on facebook, tweets on twitter
and +1s on Google Plus are also common. But there
1.1.1 Web Server, Cookies & Database are so much more we can do to get feedback from our
customers.
Logs

For the virtual world to function, logs are essential.
These logs, collected and used primarily for technical 1.2.1 Generic Criteria - Appealing to Cus-
reasons, can also be of assist to marketing strategies. tomers’ Values’ System
From these logs we can extract information like which
pages a certain customer visited, how much time he Our proposal is the use of generic criteria so that we can
spend on each one, if he reached a page following a link create a profile for the customer which doesn’t rely on
from another page or if he clicked any links on this page. specific product attributes. Since a customer may ex-
From our database we have access to things like which pose different weights and thresholds for a criterion ac-
products a customer has bought, his orders’ values, cording to a products’ category, we may adjust weights
when he placed his orders (time and date), what items and thresholds according to category.
he bought together. The proposed criteria are:
A smart system could learn from these data a cus-
tomer’s shopping habbits and use it for suggestions. • price
• quality
1.1.2 Customer’s Profile - Personalization • utility

In order to use an e-shop, someone usually needs an • ease of use
account. This alone enables us to use all the other
• coolness factor
features we write about. But also, it gives us more
data to process. • safety / trust
Typical registration data are name, address, age and
e-mail. E-mail is a handy but dangerous way to com- Each of these criteria may be given a different name for
municate with customers. We will talk about it at the different products but it actually refers to a customer’s
3rd section. values’ system.

Besides these typical data, there are more information An example would be the “coolness factor”. An iPhone
we can ask from a customer upon his registration, like may be a product with high coolness factor, but also a
to choose product categories that interest him most and fresh vegetable or a vitamin in our age of “fitness” may
what he thinks as an acceptable amount of e-mail. be an item with high coolness factor.

2

1.3 Basics: Ratings & Reviews Also a social system implemented into the store could
be of high value. For example allow user profiles, simple
Ratings and review are considered standard these days. walls, public favorite products, wish lists.
Ratings is the classic star system which usually takes
values from 1 to 5 stars and no stars equal to no rating. 1.3.3 Gamify the Store
The ratings’ titles the customer will see, don’t have to
be the names of the criteria we described previously and Another hype of modern web design is gamification.
may vary from product to product. The process of applying game techniques to your prod-
Reviews can be scanned using automatic text analysis uct (store in our case) to encourage users to engage
methods to extract useful information. We may be even more.
able to omit some criteria from ratings (six rating areas A basic example would be a personalized question.
could be too much for customer) and try to extract Let’s assume you need a bit more data to calculate a
them from comments. weak order. You can insert a question created auto-
It is important to encourage ratings and reviews. The matically and personalized for the user on a sidebar. As
most great example of user feedback is Amazon. the user browse through your store, this simple ques-
tion, e.g. do you prefer product A or product B, appears
in an unobtrusive way. Most people would give in and
1.3.1 Incentives click the answer. If another answer popped up, they
may continued to play for a bit more before returning
Giving people incentives to interact with your site and to their shopping.
give you information is crucial.
Game design is a field of its own, but is an interesting
You can appeal to many individuals who value being topic with many practical applications here.
of statue by implementing a user rating system, where
people with many useful reviews earn titles like expert
user.
2 Pre-Processing the Data
Another incentive would be offers. Like giving a 1% off
of the next purchase for any customer that leaves 10 This is the most challenging part of the process. Al-
ratings. though the implementation is left for the engineering
It is important though to distinguish between feedback team, we may give some guidelines to assist to a suc-
for an item the customer has bought and an item he cessful result.
hasn’t actually bought from our store. Because of the diversity of possible recommendation
Other incentives may include frequent contests. For ex- scenarios, it is fundamental that the processing part cre-
ample a contest could be help us find the best mobile ates such an application programming interface and/or
phone and it may become yours. We would then create query interface that it will make it easy for the software
a short questionnaire with questions that can help our engineers and administrators of an e-commerce business
marketing decisions. If you think about it, the cost of to connect it to their platform.
such a feat is small compared to other known methods Figure 1 displays the structure we propose for the rec-
of achieving the same results. You have to remember ommender platform and connects the previous section
though that the questionnaires should really be kept (input) with the current one.
sort. The challenge is that the customer should com-
plete the contest before it become tiresome for him.
2.1 Content-based Filtering
1.3.2 Social
In this type of filtering, we try to make suggestions
based on the characteristics of a product and/or the
The hype these days is social. Everything has to be
ratings of the individual we make recommendations
social, users expect everything to be social. The re-
for. This approach at its simplest form uses weights
views and user ratings system we proposed earlier are
to find recommendation candidates whereas more ad-
examples of social features. But one can move it even
vanced scenarios include machine learning techniques.
further.
It is a complex process as we have to teach the machine
A very simple way would be to incorporate social but- to search for relations across various characteristics.
tons such as the renown Like button of Facebook, the
+1 of Google and the tweet button of Twitter. Every
time a user shares an item, we can give it a leverage in 2.1.1 Using Characteristics
the weak order of its category. If the user comments
with the share, we can perform automatic text analysis This is the easier form for this type of filtering yet still
to the comment. needs a fair amount of work. Assuming standardized

3

We have defined six criteria that will be used to identify
products and explained that per category vectors will
adjust for weight and thresholds of each criterion for a
certain product.
The problem is that because of the nature of the data
collection process, we won’t have all criteria for any
product by each customer. We only get bits of informa-
tion from our customers. Our data looks like a puzzle
that statistics will help us solve.
This isn’t multi-criteria decision analysis (MCDA) as
used in marketing strategies, it is MCDA per customer
and have much greater error tolerance.
Let’s assume our store contains n products that cre-
ate the product space N . We have six criteria:
a1 , a2 , a3 , a4 , a5 , a6 . For each criterion we have much
less data than n. For example, for criterion a1 we may
Figure 1: recommender platform and input have k1 values, for criterion a2 , k2 values and so on.
We can calculate criteria’ scores adjusting for missing
values.
fields for product characteristics, we can find and sug-
Respectively, a customer may have rated a few items
gest products of the same category with similar char-
but omitted some ratings for each. We will calculate
acteristics. An example would be a visitor looking at a
his rating profile from these values.
hard disk with 2TB of space, a SATA connection and
cost around 100€. The system would proceed to find We propose this process to include a secondary weight
disks with similar attributes and suggest them. vector for different product categories. To calculate this
we will need items that the customer has both bought
A problem in the aforementioned technique is that a
and rated.
product has much more characteristics than the ones
that are important for a customer. For example hard MCDA also requires a weak preference order. This,
disks have attributes like rotational speed and access as described in the previous section, can be acquired
time. There are two solutions for this problem. by asking the user questions using various games. For
example if a user has rated two items, an mini-poll be-
The simplest one would be to hard-code the important tween these too could appear on a sidebar.
characteristics of a category. A human operator will
set the characteristics a customer is more like to find The suggestions of this module of our platform will
important. A more difficult approach would be to have mainly be for items that match a user’s rating profile.
the system learn through visitors’ page views and orders Also the pre-processing that happens here is shared be-
which characteristics are of importance for a category. tween the two filtering modules as we will see in the
next paragraphs.
This can be on a global level, so we find these charac-
teristics that are more probable to be of value for any
customer. We would expect a fairly good experience for 2.2 Collaborative Filtering
most customers with this approach.
Another way would be to make it on a personal and This type of filtering uses customer data to make recom-
temporal level. We make the first suggestion to the mendations. It tries to find persons with similar tastes
customer using global data and based on his choice, we as the current visitor and then proceeds to suggest items
adapt our weights to make suggestions based on similar that they (people with similar taste) like.
characteristics of the last viewed items. This technique This is easier than content based filtering in the sense
would be better for people who like to do more research that the system doesn’t need to understand the under-
before they buy, so there could be a trigger for such a lying data, like product characteristics, but only to find
technique based on the customers’ page views profile. persons with similar taste. Also it works well for cross-
category suggestions. On the other hand it requires
better hardware as it operates on big datasets, it needs
2.1.2 Ratings and Multi-Criteria Analysis a large amount of data before it can deliver good sug-
gestions and requires customers with a decent history
Ratings are the center of content based filtering. They of purchases and/or ratings.
can be used to suggest goods that a customer isn’t
Generally we can divide the process in two distinct
searching for.
phases. First we have to find a group of users with
Multi-Criteria Analysis can be used as a processing similar taste. This may be accomplished through clus-
method for this part of the Content-based Filtering. tering methods or nearest neighbor search. Then we

4

have to make suggestions based on the choices of the 2.3.1 No-Knobs Approach
group. For a traditional approach (known as memory
based) we could use the rating data of each user in the A no-knobs approach to the Query Interface could be
group applying weights for user taste compatibility. A further studied. In this approach the platform acts
more complex approach (model based) would require in an automated fashion, auto-tuning its results. We
data mining and machine learning techniques to find would advise against it as a primary goal. The platform
patterns in the underlying data. should be implemented with manual tuning. Then, as
it gathers more data, one could try to run in parallel
a no-knobs implementation of the Query Interface and
2.2.1 Multi-Criteria Analysis compare the results.

Multi-Criteria Decision Analysis was described before
but in Collaborative Filtering it can give even better 3 Recommendations System
results. Besides, in marketing, MCDA is used not only
to profile products but to find market segments.
The recommendations (or recommender) systems are
MCDA actually follows the work-flow we described for becoming increasingly popular for the last 4 years (2008
generic Collaborative Filtering. It has four phases: to 2012). This is mainly due to the huge penetra-
tion that mobile devices (smart phones, tablets) and
• Data acquisition which we described in section 1. internet services (social media) have introduced to the
consumer market. Some of the most well-put recom-
• User modeling which we described in Content
mendations until now are the product suggestions of
Based Filtering.
amazon.com, the song suggestions of Bang & Olufsen,
• Clustering, the 1st part of Collaborative Filtering. iTunes and Pandora Radio, the movies and video sug-
gestions of Netflix and YouTube, the search suggestions
• Recommendation Phase, the 2nd part of Collabo-
of Google Trends, the photo suggestions of Pinterest,
rative Filtering.
the network connection suggestions of Facebook and
Linkedin.

2.3 Building an API - Query Inter- As expected, the recommendation systems are slowly
gaining ground in all current online activities that can
face both directly and indirectly opose a marketing share
opportunity.
A good recommendation platform is tailored to each
product -or one level up: product’s category for a cer- There are numerous of algorithms that have been im-
tain store. Thus the people who build and maintain plemented in the design of recommendation systems.
the electronic commerce store should adjust the recom- One of the most commonly used algorithms in recom-
mendation platform to their needs. Hence the need for mender systems is the k-nearest neighborhood (k-NN)
an application programming interface including a query approach. k-NN is a method for classifying objects
interface, easy to build a GUI around it. based on the properties of its closest neighbors in the
feature space. In k-NN, an object is classified through
The API should offer a handy way to feed data to the a majority vote of its neighbors, with the object being
system, perform queries and accept results from it. It assigned to the class most common amongst its k near-
should be well defined and easily accessible from current est neighbors (k is a positive integer, typically small).
web technologies. It is an important design task for the If k = 1, then the object is simply assigned to the class
engineering team. of its nearest neighbor.
The Query Interface is a mean to abstract the com- Another popular algorithm is the Pearson Correlation,
plexity of the platform from the people who run the e- which can be proved especially useful as we can see in
commerce store. An example of a query interface would section 4 of this paper. Pearson Correlation is a mea-
be the ability to instruct the platform to return 5 sug- sure of the (linear) dependence between two variables
gestions, of which 3 will come from the content based X and Y, giving a value in the range of +1 and −1
module and 2 from the collaborative module through a inclusive. In a social network, a particular user’s neigh-
simple query. So in essence it is a standardized input borhood with similar taste or interest can be found by
of short queries that translate to a subset (albeit the calculating this coefficient. The user’s preference can
most important ones) of the total queries our platform be predicted by collecting the preference data of the
can do. The Query Interface is part of the API. top-N nearest neighbors of a particular user (weighted
by similarity).
If the recommender system is built in-house, then the
API can be omitted but it is advisable not to, since the One last well-known method, which can also be proved
implementation team will probably differ from the team useful in section 4, is the Rocchio Classification. This
maintaining the store. It can be of a less advanced form is a method of relevance feedback dating back to the
though. 1970s. Rocchio makes use of the Vector Space Model

5

and is based on the assumption that most users have The key question is what happens to the relationships
a general conception of which items should be denoted between smaller groups of people around the magnitude
as relevant or non-relevant. User feedback is used to of a handful of hundreds? Do these relationships exist
refine a search query by emphasizing or deemphasiz- and in what extent do these people influence each other?
ing certain terms (similar to how Pandora refines its Eric T.Bradlow et.al. (2005) showed that this maping
user recommendations). Through feedback, the user’s can be achieved and also can produce some very satisfy-
search query is revised to include an arbitrary percent- ing results. Around the same period many studies were
age of relevant and non-relevant terms as a means of based on the same concept. They verified that the spa-
increasing the search engine’s recall, and possibly the tial models can be successfully applied and also generate
precision as well. The number of relevant and non- some very satisfying estimations about the consuming
relevant terms allowed to enter a query is dictated by a actions of such groups.
series of weights in the central equation.

3.2.2 Spatial as in ‘type’
3.1 Hybrid Filtering
Hybrid filtering suggests the combination of collabora- Similar studies carry on until today, progressing with
tive and content-based filtering, especially if it proves some really intriguing results about the application of
to be more effective, depending on the case. A quick re- the special models in social media. Just recently on-
minder: content based filtering is being produced based line services like Myspace, Facebook, Twitter, Linkein
on information that log the customer’s profile and taste and Google+ came to play and advertisers advanced
while collaborative filtering is being produced by match- their way by making the best out of this knowledge.
ing the habits that arise between users that seem to Nowadays the interactive banners, the tag clouds and
belong to the same group. the infographics produce beautiful aesthetics which lure
the individual ‘to blend in’, ‘to belong’ and ’to follow’
Hybrid approaches can be implemented in several the trends that his friends, colleagues and associates are
ways: by making content-based and collaborative-based up to.
predictions separately and then combining them; by
adding content-based capabilities to a collaborative- In advertising, spatial and temporal placement can ac-
based approach (and vice versa); or by unifying the ap- tually become very complex. The research field of each
proaches into one model for a complete review of recom- can sum up a completely separate sector in the science
mender systems. Several studies empirically compare of marketing. This is because there are a lot of pa-
the performance of the hybrid with the pure collabora- rameters to be considered, like for example the special
tive and content-based methods and demonstrate that maping of each individual banner (or recommendation)
the hybrid methods can provide more accurate recom- according to geographical, demographical and psycho-
mendations than pure approaches. These methods can metric criteria. Another concern is the analysis of the
also be used to overcome some of the common problems spatial drift phenomenon which has to do with the ex-
in recommender systems such as cold start (not enough ponential decrement of an action’s effect (i.e. gener-
gathered information to generalize) and the sparsity ation of a specific recommendation) around a ’neigh-
problem (having too few ratings and hence too few cor- bourhood’ whether this is considered to be in space or
relations between users). time.

3.2 Recommendations spatial place- 3.2.3 Spatial as in ‘surface’
ment
As far as it concerns the spatial placement of the web-
3.2.1 Spatial as in ‘scale’ store’s layout, the corresponding ‘social media plug-ins’
are most commonly being placed around the page’s
The key assumption in the traditional marketing liter-
edges. This is due to the fact that they can easily
ature is that the consuming behavior of an individual
fill in the space between two seemingly irrelevant ad-
is conditionally independent of the consuming behav-
vertisement banners and attract the attention of the
ior of another individual. This means that the decision
reader to the surrounding area. The recommendation
making process of a sole customer is considered to be
systems usually present information that is being au-
unaffected by the decision making process of another
tomatically refreshed but always semantically attached
customer, let alone a group of customers.
to the consuming preferences of the customer’s network
We know that the latter simplification can be very use- connections. This way they attempt to maximize the
ful in some special cases but this requires extra caution chances of him indulging or interacting with the adver-
when applied in general. So, when researches study tisement. The same plug-ins are usually implemented
large groups of people they tend to adopt theories that in a way that they can learn and improve their sugges-
stand to the concept of ‘sole identity’ assigning a per- tions according to the customers behavior and use this
sonal human-like behavior to a crowd. information as a feedback to his connections too.

6

3.3 Recommendations temporal indirect one which will attempt to persuade him about
placement the company’s current concern or its vision about the
future.
Whatever we do, however we do it, regardless of how Nowadays similar techniques are being used by many
important it may seem at the moment, it will actu- firms and provide results which solely depend on the
ally be quite useless or insignificant if attempted in the fashion making abilities of their respective departments.
wrong time. So, timing is of great essence and this ap- Big brand-names which hold a solid corporate identity
plies for every action we take in life. Likewise, every often produce content like this to promote their innova-
temporal placement of a campaign, any recommenda- tive services and products and at the same time to avert
tion technique, if properly designed, can produce great a competitor from impinging to the same one. Smaller
results, but in any other case may have no effect or brands do the same in order to assiduously make their
even worse, the opposite of the one that we desire and way in a niche market share.
expect.
Time, unlike space, is a dimension that opposes the
same limits to every marketer. Expertise is always con- 3.3.2 Measure Validation
sidered an asset, but in the field of time, all ‘players’
share about the same chances of success. No-one can Marketing measures must efficiently estimate the per-
establish an advantage in the field of time regardless formance of a particular strategy. The indexes used in
of how huge or small brand name he claims to own. such cases can either have some physical meaning or
And the same goes for the recommendation systems not, its doesn’t really matter as long as marketers are
too. Conquering in the field of time demands good able to understand how they work and why. Two com-
strategy skills and an intuitive ability to foresee the mon scientific criteria for the meaning of measures are
market trends. reliability and validity.
Researchers have proposed a lot of versatile ideas about
Reliability can be verified when testing the correlation
the formulation of relevant mathematical models which
between the measure taken at different times (retest
predict such fashions and in some extent they generate
stability) with equivalent forms or with split halves (in-
some good results. But machine learning techniques,
ternal consistency).
artificial intelligence, fuzzy logic and neural networks,
even when combined all together or used extensively in Validity on the other hand some many forms. There is
simple cases, can (until now that is) not reproduce the face or consensus validity which exists when a measure
results that a talented visionary human may have, just looks as if it should indicate a particular variable or
because successfully forming the future will always be concept. Using this form is not safe because studies
more rewarding than managing to predict it. have shown that scores on recognition measures can be
influenced by irrelevant response set.

3.3.1 Placing Strategy Another, more objective form of validity is predictive
or concurrent validity. Predictive validation procedures
By taking into consideration all of the above we con- consist of determining the extent to which particular
clude that an automated recommendation system must: measures predict other ‘criterion’ measures, so it has
much pragmatic meaning in marketing. This measure
• focus on small time frames and make quick but validation exhibits some pros and cons.
small steps rather than slow and big ones in order
to keep all options open until the very last minute When the predictions based on the latter form are be-
of commission (pivoting), having too fuzzy the certainty of the measuring results
can be enhanced by using the measure validation which
• exhibit preference in aborting if the recommenda-
consists of two parts: convergent and discriminant val-
tion has a high risk of being unwelcome in the des-
idation. The first is synonymous with predictive or
ignated time of prompting (because you can never
concurrent validation, which means that a measure can
be too late as trends cycle over times, but you can
adequately represent a variable if it correlates or ‘con-
be too quick!) and
verges’ with other supposed measures. Discriminant
• try to lead the market by rather shaping the con- validation is absolutely necessary to really pin down the
sumers’ opinion (branding) rather than passively meaning of measures. This is because a measure may
awaiting for a feedback on about what are they converge with measures of other variables in addition
about to wish/like. A recommendation system can to the one of interest.
achieve that by enriching the conventional content
with injected information (impression) that will be Finally, construct validation can only be considered
intentionally kept out of focus. after measure validation is established. This valida-
tion actually provides a proof of concept as it checks
So, the subject will hopefully be the recipient of two whether a hypothetical construct, composed of several
messages, a straightforward one, which will literally en- similar variables, actually operates in the scientifically
courage him to try a special product or a service, and an expected way.

7

3.4 Recommendations outside the and you should always ask for a customer’s approval
store before using them.
But mobile apps can also be used to gather feedback
Internet and mobile platforms offer many ways to reach from customers. Smart-phone games are popular these
to your customers outside of your e-commerce site. days; why not build one that also does a bit of market
Let’s explore two of the most popular. research?
As good as all these seem though, the mobile sector
is managed mainly by advertisers, so you may have to
3.4.1 E-mail
work with them.

E-mail is a dangerous field. It is absolutely a love or
hate factor to your customer relations. Do it wrong and
you will damage your image, do it right and you will 4 Metrics; Quantify the Suc-
see frequent visits to your business. A middle ground cess of the System
doesn’t exist.
We suggest that email should be used for sending rec- Data mining techniques can be proven of great essence
ommendations to the customers but with the utmost if they are used properly in systems like our proposed
care. Recommendations send through email, should marketing platform. They hold the strength to reveal
contain a mix of suggestions relevant to items the cus- well-hidden social habits, market trends, customer pro-
tomer saw during his last visits and suggestions that file characteristics and other natural patterns which in
are of his taste. the general case tend to be ignored or undervalued.
They can also make use of seemingly arbitrary data,
The frequency of e-mails should be related to the fre-
given as an input and produce some very interesting
quency of visits the customer pays to the shop. A
estimations in their output.
less interested customer will probably be offended by
frequent email. Expected etiquette for communication But the output itself needs further studying by an ex-
between a business and an individual suggests is to be pert, usually a human, in order to be interpreted and
reasoned. Contact the customer only if the business accompanied by a logical reasoning. Also, even if pro-
have some offers on items he may be interested or there cessed automatically, the expert must provide a well-
are new arrivals of items he may be highly interested. pointed definition of their application domain in order
to lessen the fuzziness that usually dominates around
the theoretical models of a respective AI expert-decision
3.5 Mobile - Suggestions on the system.
Road So, the intervention of a human expert in the process
although undesirable is usually encouraged because the
Every day smart-phones and other web-enabled devices proper solution of an automated artificial intelligence
make their way into customers’ pockets by the millions. machine cannot always warranty a satisfying result.
Erick Schmidt of Google on September 5th, 2012, an- This is due to the fact that nature’s patterns, let alone
nounced that Android devices alone have over 1.3 mil- laws, are formed in a non-deterministic and non-linear
lion activations per day! way and the solutions produced are only a fair estima-
tion of the truth.
A significant part of e-commerce will be transferred to
the mobile in the coming years and a whole new set of Scientists know that the complexity of such problems
possibilities will open. There are already applications can be reduced by linearization techniques. In our pro-
that take advantage of the mobile, though not widely posed platform we do the same by modeling a good
deployed nor accepted as much. linear substitute, which, in other words, means that we
build a system which uses its linear inputs to ‘sense’
E-commerce businesses may use the mobile through mo- some quantities (here the market’s trends) and learn
bile web sites and applications. But this use doesn’t how to adapt its outputs accordingly (here the cus-
differ much from the typical web use, so everything we tomers’ recommendations) to maximize the profit.
said until now applies.
To accurately identify the dominant factors that rule
The game-changing feature of smart-phones is that peo- the market’s trends and quantify their importance in
ple are constantly connected, have their device always each scenario, we use special indexes called metrics or
at hand and are frequently tracked. For example a cus- KPI (key performance indicators). The enumeration
tomer’s location can cue us as to when to send rec- of the existing KPI is a difficult job since it is almost
ommendations to his device. If we know his location impossible to gather information on already working
on block level, we can detect when he is out shopping systems but most companies tend to use some of the
and in some cases what he is shopping. If we can pin- metrics already made known by the international bib-
point his location to a few meters, we may even know at liography. Companies tend to gather a lot of data (i.e.
which shop he is. Such techniques may bring criticism by market research using questionnaires) or use many

8

different KPI in order to successfully detect the pulse This feedback strategy ensures that the system will
of the market in any given time desirable (i.e. in order eventually and concurrently blend in to the market
to apply a promising marketing campaign, change their needs by inheriting its trends. This method is similar
brand orientation, use pivoting, etc). to an automated multi-criteria analysis and promises
to produce fast and traceable results with increasing
But neither the gathering of a high volume of data or
marketing efficiency.
the application of a well-guessed selection of the corre-
sponding KPI can guaranty the success of the applied In the subsections to follow we appose some of the most
decision. High volume of data is source consuming and promising KPI which are frequently used by companies
impractically optimistic while the random selection of and organizations.
KPI itself defies the very nature of non-determinism in
non-linear and complex problems. This is because an
observer can only look for logical and simple explana-
tions of why things work in a specific way, lucking the 4.1 OCR - Order Conversion Rate
ability, or even worse neglecting the comprehension, to
realize and/or understand the existing correlations be-
tween all the factors that balance a phenomenon to a Order conversion rate happens to be one the prime most
given state. important values of KPI. Generally a conversion rate
refers to the percentage of events leading to another
Also, statistics in general are doomed to become useless event. In e-commerce it usually refers to the percentage
if the policies involved around the data aggregation and of visits that convert into orders. Most web analysts
process cannot generate coherent results that will lead define the conversion rate as the percent of site visitors
to a straightforward and cost efficient strategy. Unfor- who do something that the company wants them to do,
tunately, the techniques usually inherited by medium- like submit an order, sign up for an email or send a
scale companies are being efficient but there is no safe- share link and so on.
way to ensure that they are also the optimum, so they
sometimes can exhibit significant shortcomings. The order conversion rate is an important metric but
If we may address some of the main reasons for this, we it just shows how the minority did something but not
believe that its due to the fact that a suitable KPI is ap- why the majority did not do anything. This is because
plied in the wrong context or used in the wrong extent it is not always the main aim of visitors to go to a
and because the system is forced to process and inter- homepage to buy something. Customers also intend to
pret high-volume data which are bringing back fuzzy do research, look for information, read a blog or just
and thus untrustworthy results. need advice. So OCR and similar CR are useful but
only if combined with other KPIs too.
To counter that, our proposed scheme tries to check
which metrics (KPI) are highly correlated with the
desired outcome (overall profit) and assigns special
‘weighting factors’ to each one of them in relevance to 4.2 RpV - Revenue per Visit
their estimated correlation. The idea is to use the col-
lective information of many different metrics in a ver-
Average revenue per visit is a well-known marketing ac-
satile way to make them form the optimized ‘success
quisition indicator. It is defined simply as the sum of
quantifiers’ that best suit the e-commerce business in
revenue generated / number of visitors. This metric
question. The very first time the ‘weighting factors’ get
is being raised when more valuable customers are at-
random values but after successive passes the system
tracted to the e-commerce business store.
will tend to settle to a specific state.
We achieve that by inducing a dedicated feedback sys-
tem (an example would be a system based on genetic
algorithms). This measures the system’s efficiency and 4.3 Page Visits
adjusts the relevant importance of each metric accord-
ingly. Each weighting factor represents the metric’s
Page views is a metric mainly used to quantify the ef-
significance towards the desirable effect, which is to in-
fect that a marketing campaign has on a specific target
crease the profit without compromising (not a lot at
group and can produce some nice conclusions especially
least) the rest of the metrics weighting effect.
when studying the temporal and spatial allocation of
If the metric in question proves to be important then the channels that the customer traffic passed through
it makes sense to focus more effort on its fine-tuning, until it arrived to a specific page on the web-store. Page
so its highly-rated. In any other case the metric’s re- visits comprise a very important index for the system we
sults must be considered stochastic and thus its influ- propose. It can distinguish the extent in which our vis-
ence must be semantically regarded as untrustworthy, itors are motivated to visit the web-store due to some
so in every pass, the system assigns a small value to social media campaign, quiz, etc and when combined
it, and subsequently this becomes small enough to be with the RpV it can show the clout that each social
ignored. media poses upon our visitors over time.

9

4.4 AOV - Average Order Value For example one could use the cart just to get a pic-
ture of the ‘total cost’ of his wishing list, although he
Average order value is considered a key performance never meant to buy the products this instance. Also, a
indicator when combined with revenue per visit and customer may need to use the cart just to collect some
order conversion rate. The basic calculation is Average of his favorite stuff in one place, due to the luck of a
Order Value = Sum of Revenue Generated / Number ‘favorite’ or ‘share’ button. He may also do the same
of Orders Taken. thing if he wishes to use the links of two or more similar
products he wishes to compare but the site lucks such
Usually experts work diligently on both OCR and AOV a service. A low percentage of visitors who purchase an
in order to improve both at the same time by seg- item after adding it to the cart could also may mean
menting visitors and marketing campaigns into high, that your shipping rates are high or that your checkout
medium and low AOV groups. This can help iden- template is confusing.
tify the special marketing approach that the company
should apply on each group in order to get the best So, the ‘cart abandonment’ results must always be ana-
collective outcome. lyzed in accordance to criteria that are stripped off such
scenarios or take them into consideration by proper nor-
Managing the balance between OCR and AOV can be malizing factors. And one of the aims of the proposed
a tricky part. A small increase to one could mean the platform is to achieve this normalizing efficiently and
drop of the other and if this drop is of great magnitude automatically.
compared to the rise then the revenue per visitor may
be strongly impacted.
4.8 Up-selling rate
4.5 Customer return The up-selling term indicates the percentage of success-
ful ’hits’ on a recommended order of a product similar
This is one very important metric because it signifies to the one that the customer just ordered or intended
the amount of dedication that the target group shows to order. Up-selling is considered to be successful if it
to the business. This group is the one that consciously leads to a more profitable sale without jeopardizing the
supports in both direct (buys) and indirect (personal good relationship that the organization has built with
recommendations to friends) ways the company. the customer. So, to produce a feeling of comfort to
the customer, the offer must be very well-pointed by
Again, this KPI can provide valuable feedback to the providing a recommendation that enhances the value
expert decision system especially when combined with he gains from the bargain. An example would be to
others. propose the purchase of one more product which is a
warranty extension on the hard disk drive that the cus-
tomer is willing to buy.
4.6 Time spent on store The similarity between the two products (i.e. war-
ranty extension and hard disk drive) will be modeled
Time spent on a store can have a lot of different mean- by a real number called ‘relation-strength’. The ‘rela-
ings. More time could mean more chances for a cus- tion strength’ can be defined by an expert at first by
tomer to indulge himself into an order or it could mean setting the appropriate values in a product-to-product
a bad design that makes it hard for the customer to find matrix. When the system becomes operational, its algo-
what he needs. In the second case it is mostly guar- rithms fine-tune this matrix by reassigning new values
anteed that the ‘customer return’ KPI will get affected to it, as indicated by the real market’s feedback. As
and become smaller. So this is where data-mining tech- the cautious reader can notice, this idea is based on a
niques come to play and prove to be very handy. If the principle similar to the one we have already described
volume of the collected data is sufficient enough (and about the corellation weights between a KPI and the
not extremely large) then this metric can reveal very overall profit.
useful information about the quality of the site’s design
and/or the applied marketing technique at hand.
4.9 Cross-selling rate
4.7 Cart abandonment The cross-selling term is very similar to the up-selling
one and there is only one small distinctive difference.
If used properly, the metric of ‘cart abandonment’ can It indicates the percentage of successful ‘hits’ on a rec-
be a very helpful asset in determining what products ommended order of a product seemingly irrelevant to
and categories are driving the most abandonment and the one that the customer just ordered or is about to
under which circumstances. But it should be used with order. A valid example could be the proposal of a car
extra caution because it can get easily misleading. The charger for a mobile phone to someone who bought a
problem does not always lie on the shipping costs or the memory card for his laptop. It seems irrelevant but the
billing fashion. system can guess the customer’s financial status as he

10

is aged enough, he just bought a laptop, he is male and lower abstraction computer languages that tend to per-
he has declared a mobile number so he probably owns form better. To glue all the parts together, we would
a car too. The system could also know nothing about use higher abstraction, easier to handle languages. We
this customer but could extrapolate information from would make our API’s compatible to industry standard
the profile patterns of other customers that fit in the technologies, easy to use and with good documentation,
same target group. to help clients connect their e-commerce businesses to
our platform.
As far as it concerns the implementation of the process-
ing of this profiling information, the only thing needed
is a medium-scale database (i.e. over 1000 records) and 5.1 Making a profit
the algorithmic strategy already stated in the previous
section. Implementing the platform won’t come cheap. The pri-
mary cost will be the salaries of the engineers needed.

4.10 RoI - Return of Investment
5.1.1 Software Product
All the metrics mentioned above could be considered
as variables (inputs) which will are being set according The common approach is to release the platform as
to the results that are generated by RoI. So, RoI value a software product which businesses can buy and
can, apparently be considered as a proposed output and then achieve higher revenue through technical support
its value a relevant estimator of the system’s efficiency. and/or software upgrades. For such an approach to
work, you have to make your software as compatible
In general, return of investment is the amount of profit a
and easy to setup and use as possible. This leads to
certain policy can produce in relation to the cost of this
much higher costs and may be a limiting factor to the
policy’s adoption. In the proposed multi-criteria anal-
number of features you can implement. Also in this ap-
ysis this metric can reflect the company’s total profit
proach you should generally try to release a product as
change rate in relation to the absolute value of the
perfect as possible, since fixing mistakes on customers’
change of a respective KPI. So RoIOCRt1 will be the
side isn’t viable. This will make your time schedules
value that RoI acquires at the moment t1 when OCR,
longer.
and only OCR, changes. In the special case where
RoIOCRt1 appears to be monotonous in the range we
are interested in we can safely make future predictions 5.1.2 In-House
according to the graph’s envelope or the projections of
RoIOCRt1 trend-line. For a large e-commerce business, it would make sense to
hire a team to implement the platform in-house. This
approach has the benefits of a tailored solution. It can
5 Implementation be tweaked and adjusted to the company’s needs.
But for it to make sense, it would have to give the sales
Our proposed recommender platform is a challenge to a boost much bigger than a ready made solution. We
implement. It has high complexity and a big volume of would say that the expected profits’ increase on top of
necessary features. the profits’ increase a ready made solution would bring,
should be such that the implementation should pay for
A team of software engineers with a strong background it self in two to three years.
in math would be able to bring it to life. Alternatively
a team of software engineers with structured thinking
and a mathematician with a background in statistics 5.1.3 Software as a Service (SaaS)
and algebra would also be able to pull it off.
There is much hype around this concept these days.
The system should first be carefully designed and then
A company would implement and deploy the platform
implemented. Because of its complexity and many pos-
to its own servers and then customers would pay for
sible points of failure, it would be best to partition the
access to it (usually in monthly or annually plans). This
platform to many small modules with distinct purposes
approach from a financial standpoint has the advantage
and write some test cases.
of a steady revenue. Also, because of the savings on the
The technologies that are going to be used should be customers’ side due to lower maintenance costs, he will
chosen by the design team. The business model that accept to pay more for our product.
will be used may be of importance in this decision.
In the implementation front, this approach increases
Ourselves, we prefer the SaaS approach (explained the running costs due to the need for servers and an
later) and we would proceed in an implementation on administration team. Also it increases your responsi-
the cloud that would be highly scalable. Parts of the bility since part of your customers’ business runs on
design that are CPU bound we would implement them your servers. On the other hand, the ability to perform
using high performance DBMS and/or in-house code in maintenance and upgrades at any time to our software,

11

facilitates the software engineering part and let us do in- References
cremental enhancements, test features and do rollbacks
when needed, thus allowing less demanding times and [1] Nikolaos F. Matsatsinis and Yannis Siskos
resulting in a more robust product from the client’s per- MARKEX: An intelligent decision support sys-
spective. tem for product development decisions. European
Journal of Operational Research Vol. 113, No. 2
(1999), pp. 336-354 Article Stable URL: http:
5.1.4 Open Source //dx.doi.org/10.1016/S0377-2217(98)00220-3

Many software products choose the path of open source, [2] Lakiotaki, Kleanthi and Delias, Pavlos and
commonly known as community editions of the prod- Sakkalis, Vangelis and Matsatsinis, Nikolaos User
uct. This doesn’t stop a company from selling, renting profiling based on multi-criteria analysis: the role
(SaaS) or charging technical support for its platform. of utility functions. Technical University of Crete
Decision Support Systems Laboratory University
The main advantage is that programmers from all over Campus 73100 Chania Greece Operational Research
the world will help you build, test and maintain your Springer Berlin & Heidelberg Vol. 9, No. 1 (2009),
product. Most usually you will find that these people pp. 3-16 Article Stable URL: http://dx.doi.org/
are your customers who decide to implement a certain 10.1007/s12351-008-0024-4
functionality themselves.
[3] Lakiotaki, K. and Matsatsinis, N.F. and Tsoukià
Also administration teams (which will connect their ands, A. Multicriteria User Modeling in Recom-
company’s e-shop with your platform) tend to feel more mender Systems. Intelligent Systems, IEEE Vol. 26,
safe with open source approaches as they know they can No. 2 (Apr., 2011), pp. 64-76 Article Stable URL:
tailor a solution to their needs if needed and that your http://doi.acm.org/10.1109/MIS.2011.33
product’s viability doesn’t depend on your businesses
[4] C. Moghrabi and M.S. Eid Modeling users through
viability.
an expert system and a neural network. Computers
& Industrial Engineering Selected Papers from the
22nd ICC and IE Conference Vol. 35, No. 3-4 (1998),
pp. 583-586 Article Stable URL: http://doi.acm.
org/10.1016/S0360-8352(98)00164-8
[5] Liu, Liwei and Mehandjiev, Nikolay and Xu, Dong-
Ling Multi-criteria service recommendation based
on user criteria preferences. Proceedings of the fifth
ACM conference on Recommender systems - Rec-
Sys (2011), pp. 77-84 Article Stable URL: http:
//doi.acm.org/10.1145/2043932.2043950
[6] Mehdi Dastani and Nico Jacobs and Catholijn M.
Jonker and Jan Treur Modelling User Preferences
and Mediating Agents in Electronic Commerce.
1999.
[7] Beliakov, Gleb and Calvo, Tomasa and James, Si-
mon Aggregation of Preferences in Recommender
Systems. School of Information Technology, Deakin
University, 221 Burwood Hwy, Burwood, 3125
Australia Springer US (2011), pp. 705-734 Ar-
ticle Stable URL: http://dx.doi.org/10.1007/
978-0-387-85820-3_22
[8] Fabian Abel, Silvia M. Baldiris, Nicola Henze User
Modeling, Adaptation and Personalization. Adjunct
Proceedings of the 19th International Conference on
UMAP, Poster and Demo Track (Jul., 2011)
[9] Roger M. Heeler and Michael L. Ray Measure Vali-
dation in Marketing. Journal of Marketing Research
Vol. 9, No. 4 (Nov., 1972), pp. 361-370 Article Sta-
ble URL: http://www.jstor.org/stable/3149297

12

Contents
Introduction 1

1 Data Sources - Signals 2
1.1 Standard Data Sources . . . . . . . . . . 2
1.1.1 Web Server, Cookies & Database
Logs . . . . . . . . . . . . . . . . 2
1.1.2 Customer’s Proﬁle - Personalization 2
1.1.3 Products Speaking for Themselves 2
1.2 User Feedback; Earn it, Shape it for
Multi-Criteria Analysis . . . . . . . . . . 2
1.2.1 Generic Criteria - Appealing to
Customers’ Values’ System . . . . 2
1.3 Basics: Ratings & Reviews . . . . . . . . 3
1.3.1 Incentives . . . . . . . . . . . . . 3
1.3.2 Social . . . . . . . . . . . . . . . 3
1.3.3 Gamify the Store . . . . . . . . . 3

2 Pre-Processing the Data 3
2.1 Content-based Filtering . . . . . . . . . 3
2.1.1 Using Characteristics . . . . . . . 3
2.1.2 Ratings and Multi-Criteria Analysis 4
2.2 Collaborative Filtering . . . . . . . . . . 4
2.2.1 Multi-Criteria Analysis . . . . . . 5
2.3 Building an API - Query Interface . . . 5
2.3.1 No-Knobs Approach . . . . . . . 5

3 Recommendations System 5
3.1 Hybrid Filtering . . . . . . . . . . . . . . 6
3.2 Recommendations spatial placement . . 6
3.2.1 Spatial as in ‘scale’ . . . . . . . . 6
3.2.2 Spatial as in ‘type’ . . . . . . . . 6
3.2.3 Spatial as in ‘surface’ . . . . . . . 6
3.3 Recommendations temporal placement . 7
3.3.1 Placing Strategy . . . . . . . . . 7
3.3.2 Measure Validation . . . . . . . . 7
3.4 Recommendations outside the store . . . 8
3.4.1 E-mail . . . . . . . . . . . . . . . 8
3.5 Mobile - Suggestions on the Road . . . . 8

4 Metrics; Quantify the Success of the Sys-
tem 8
4.1 OCR - Order Conversion Rate . . . . . . 9
4.2 RpV - Revenue per Visit . . . . . . . . . 9
4.3 Page Visits . . . . . . . . . . . . . . . . 9
4.4 AOV - Average Order Value . . . . . . . 10
4.5 Customer return . . . . . . . . . . . . . 10
4.6 Time spent on store . . . . . . . . . . . 10
4.7 Cart abandonment . . . . . . . . . . . . 10
4.8 Up-selling rate . . . . . . . . . . . . . . 10
4.9 Cross-selling rate . . . . . . . . . . . . . 10
4.10 RoI - Return of Investment . . . . . . . 11

5 Implementation 11
5.1 Making a proﬁt . . . . . . . . . . . . . . 11
5.1.1 Software Product . . . . . . . . . 11
5.1.2 In-House . . . . . . . . . . . . . . 11
5.1.3 Software as a Service (SaaS) . . . 11
5.1.4 Open Source . . . . . . . . . . . . 12

13

Building a Recommendation Platform For E-Commerce Businesses Based on Hybrid Content Based - Collaborative Filtering And Web 2.0 Concepts

More Related Content

Similar to Building a Recommendation Platform For E-Commerce Businesses Based on Hybrid Content Based - Collaborative Filtering And Web 2.0 Concepts (20)

Recently uploaded (20)

Building a Recommendation Platform For E-Commerce Businesses Based on Hybrid Content Based - Collaborative Filtering And Web 2.0 Concepts