Slides of my 'Haystack - The search relevance conference' talk on approaches to relevance scoring based on product data. The fist part gives an overview on scoring in e-commerce search, the second part explains a new approach to relevance scoring based on image recognition
1. A picture is worth a
thousand words
Approaches to relevance scoring based on product data, including
image recognition
René Kriegler, @renekrie
Haystack - The Search Relevance
Conference
11 April 2018
2. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
About me
More than 10 years experience as a freelance search consultant, often in a role
for OpenSource Connections
Focus:
- Search relevance optimisation
- E-commerce search
- Solr
- Coaching teams to establish search within their organisation
Organiser of MICES - Mix-Camp E-commerce Search (Berlin, 13 June,
mices.co, call for talks open until 22 April)
Maintainer of Querqy (OSS query rewriting library - github.com/renekrie/querqy)
2
3. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
E-commerce search
E-Commerce Search as part of the ‘buying decision process’
- Search can/should be optimised towards the different stages of the buying
decision process
- Purchase as one signal of a successful search
Philip Kotler, Kevin Lane
Marketing Management
1997
Peter Morville
Ambient Search, 2005
Problem
recognition
Information
search
Evaluation
of
alternatives
Purchase
decision
Post-
purchase
behaviour
3
4. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Relevance in e-commerce search
Unlike in other search domains,
documents in e-commerce
search describe a single item -
each document is a ‘proxy’ for a
concrete thing that we could
touch/examine in a shop
4
5. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Relevance in e-commerce search
Unlike in other search domains,
documents in e-commerce
search describe a single item -
each document is a ‘proxy’ for a
concrete thing that we could
touch/examine in a shop
Consumer interests become part of
relevance criteria:
- Product specification (Does the
SSD drive of that laptop have
enough capacity for me?)
- Value / price
- Availability (Wait three weeks for
a pair of shoes?)
- Brand reputation
- Seasonality / freshness
- Reviews / ratings
- ...
5
6. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Relevance in e-commerce search
O. Alonso, S. Mizzaro: Relevance Criteria for E-Commerce: A
Crowdsourcing-based Experimental Analysis, SIGIR ‘09, 2009.
6
7. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
The seller perspective
How can search result ranking maximise profit?
- Show results most relevant to the user
- Maximise margin
- Sales, stock clearance
- Sell search result placements (see Amazon’s ‘Sponsored by ...’)
7
8. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Ranking factors
Search result ranking factors in e-commerce search
- Topicality - identify the product (type) that the user is searching for (‘laptop’
vs ‘laptop backpack’)
- User’s relevance criteria (e-commerce/non-ecommerce)
- Seller’s interests (maximise profit)
- Personalisation & individualisation
8
9. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Ranking factors
Search result ranking factors in e-commerce search
- Topicality - identify the product (type) that the user is searching for (‘laptop’
vs ‘laptop backpack’)
- User’s relevance criteria (e-commerce/non-ecommerce)
- Seller’s interests (maximise profit)
- Personalisation & individualisation
I will focus on topicality for
the rest of my talk
9
10. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Standard scoring models
Standard scoring models evolved with
enterprise search/general web search
in mind:
Typically
- Long documents
- unstructured/semi-structured
- mixture of many, often
abstract topics
10
11. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Standard scoring models
Standard scoring models evolved with
enterprise search/general web search
in mind:
Typically
- Long documents
- unstructured/semi-structured
- mixture of many, often
abstract topics
Compare with e-commerce search:
Typically
- Short documents
- Fields
- About a single, concrete
thing (‘proxy’)
11
12. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Standard scoring models
Often based on language model that tries to predict the query likelihood given
document/index term distributions:
Score = f(tf, df)
(See tf*idf, BM25(F))
12
13. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Standard scoring models
Score = f(tf, df)
Both, tf and df have problems in e-commerce search:
- Unclear - often adverse - interaction of tf and df with fields
- tf often equals 1
13
14. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Standard scoring models
Score = f(tf, df)
Both, tf and df have problems in e-commerce search:
- Unclear - often adverse - interaction of tf and df with fields
- tf often equals 1
- doc length normalisation of tf often doesn’t work:
- Acer Aspire E5-523-962Z - Laptop 2.9GHz A9-9410 15.6" 1366 x 768pixels Black
14
15. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Standard scoring models
Score = f(tf, df)
Both, tf and df have problems in e-commerce search:
- Unclear - often adverse - interaction of tf and df with fields
- tf often equals 1
- doc length normalisation of tf often doesn’t work:
- Acer Aspire E5-523-962Z - Laptop 2.9GHz A9-9410 15.6" 1366 x 768pixels Black
- Laptop
15
16. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Standard scoring models
Score = f(tf, df)
Both, tf and df have problems in e-commerce search:
- Unclear - often adverse - interaction of tf and df with fields
- tf often equals 1
- doc length normalisation of tf often doesn’t work
- Counter-intuitive:
- If two documents describe a laptop, they should both have the same
topicality score regardless of the distribution of the terms in their
description
16
17. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
E-commerce scoring models
Few scoring models were designed specifically for e-commerce search
Predict product type and product properties from indexed product data
and from the query and match at query time
- SEMKNOX search engine (based on ontology)
- Product type prediction from query at Amazon (D. Sorokina, E.
Cantú-Paz, The Joy of Ranking Products, SIGIR ‘16, 2016)
=> Score tends to become binary (match vs no match)
- Great intuition (a laptop shouldn’t be more ‘laptopish’ than the other)
- Less noisy input for combination with other ranking factors
17
18. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
A picture is worth a thousand words
Query: laptop
18
19. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
A picture is worth a thousand words
Query: laptop
19
20. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
A picture is worth a thousand words
Query: laptop
20
Laptop Laptop backpack
21. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
A picture is worth a thousand words
Product pictures fit the ‘proxy’ metaphor nicely - they visually represent the
real-world product that the document stands for
Image recognition needed to explore product pictures for search -> model
product type (and properties)
Image recognition already being explored for e-commerce search:
- nyris.io: known-item search
- cerebel.io and Han Xiao, Zalando research (https://bit.ly/2EdQwtc): joint
visual/textual search model
21
22. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
A picture is worth a thousand words
Can image recognition be used for search in a
simpler way?
22
23. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
A picture is worth a thousand words
Can image recognition be used for search in a
simpler way?
Maybe just for scoring?
23
24. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Image-based relevance scoring
Inception 3
Image recognition
(Tensorflow)
Acer Aspire E5-523-962Z
- Laptop 2.9GHz A9-9410
15.6" 1366 x 768pixels
Black
Recognize image
Output vector (Softmax):
x000: 0.00145
x001: 0.00030
...
x711: 0.79200 (laptop)
...
x999: 0.00801
Acer Aspire E5-523-962Z
- Laptop 2.9GHz A9-9410
15.6" 1366 x 768pixels
Black
Image recognition output
vector [...]
Enrich documents with
image recognition output
vectors during indexing
24
25. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Intuition for scoring
Likelihood of query ‘notebook’
in vector subspaces
+
+
+
+
+
+ +
-
-
Space of indexed Inception output vectors
25
26. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Intuition for scoring
Likelihood of query ‘notebook’
in vector subspaces
+
+
+
+
+
+ +
-
- Higher score for query
‘notebook’ for documents
having these images (5/5 vs
2/4)
Space of indexed Inception output vectors
26
27. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Towards a scoring formula
Score ~ Likelihood of query given an image recognition vector subspace
- Likelihood could be estimated but would assign too high a score to
specific product subtypes (such as ‘running shoes’ for query ‘shoes’)
- Better:
Score ~ Jaccard similarity(products in vector subspace, products that match
the query)
27
28. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Towards a scoring formula
Defining vector subspaces
- Split space by random hyperplanes -> Random Projection Tree
- Use more than one tree to reduce impact of hyperplanes that run through
a group of closely related images -> Random Projection Forest
- Per document: index few random projections instead of high-dimensional
vector
28
29. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Using random projection forests
29
30. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Using random projection forests
30
31. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Using random projection forests
31
32. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Using random projection forests
V1 => “11” (or “3”)
V2 => “11” (or “3”)
V3 => “00” (or “0”)
V4 => “01” (or “1”)
32
Great video: Maciej Kula - Speeding
up search with locality sensitive
hashing:
https://www.youtube.com/watch?v=NtA
KQIrIU7w
33. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Demo
Solr plugin demo
Many thanks to Profitmax (http://testit.de &
http://preisvergleich.ch) for letting me use
their product data for this demo
33
34. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Using random projection forests
A forest of 16 trees à 24 hyperplanes in Solr.
We can work with fewer trees and hyperplanes at query time
(for example, use p_tree_2:010* to query 3 hyperplanes in
tree p_tree_2)
34
35. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Search quality comparison
Experiment:
- Solr plugin to implement scoring based on image recognition
- Index product data
- Calculate search quality metrics for 100 queries, based on judgments
derived from live traffic
- Compare with other scoring algorithms
A great ‘Thank you’ to otto.de for letting me use
their product data and search judgment data!
35
36. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Search quality comparison
36
37. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Search quality comparison
Scoring based on image recognition
- Implemented using a random projection forest of 16 trees à 5 hyperplanes
- Scored by sum of Jaccard Similarities between documents in vector
subspaces and documents that match category query tokens only - no
additional tf*idf scoring
=> Image-recognition based scoring on a par with best language model based
scoring in experiment
37
38. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Further improvements
Future improvements/experiments:
- Use language model based scoring as tie-breaker for documents that
yield the same score based on image recognition
- Combine with Jaccard Similarity of further query fields (beyond
category)
- Retrain image recognition for product properties, combine with model
for product types
- Tag document ‘offline’: weigh document terms using the same
intuition (= term likelihood given the image recognition vector)
38
39. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
A picture is a worth a thousand words
W. Di, N. Sundaresan, R. Piramuthu, A. Bhardwaj: Is a Picture Really Worth a
Thousand Words? - On the Role of Images in E-commerce. WSDM ‘14. 2014
39
40. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
A picture is a worth a thousand words
W. Di, N. Sundaresan, R. Piramuthu, A. Bhardwaj: Is a Picture Really Worth a
Thousand Words? - On the Role of Images in E-commerce. WSDM ‘14. 2014
It’s at least worth a language model! ;-)
40
41. A picture is worth a thousand words - relevance scoring based on product data, Haystack, 11 April 2018, René Kriegler (@renekrie)
Thank you!
http://www.rene-kriegler.com
@renekrie
Product images taken from Icecat open catalogue (icecat.biz) and preisvergleich.ch product data
41