Groupon uses deep learning techniques like Doc2Vec for query understanding to improve recall of long tail queries. A CNN model predicts an image propensity to purchase score used in Groupon's home feed. Deal2Vec embeddings leverage user session data to recommend similar deals beyond co-purchases. Groupon aims to replace traditional feature engineering with embedding representations and expand deep learning applications like mobile credit card detection.
The Ultimate Guide to Choosing WordPress Pros and Cons
Deep Learning Application within search and ranking at Groupon
1. GROUPON, INC. PRIVILEGED AND CONFIDENTIAL – DO NOT DISTRIBUTE
Deep Learning @ Groupon
Applications within Relevance and Ranking
AI Summit San Francisco, 19-20th September 2018
2. The Team
Bojan Babic is a Senior Engineer at Groupon working on core
search and relevance in both personalized deals search and
deal recommendations
@bojanbabic
Joaquin A. Delgado, PhD. is currently serving as Director of
Machine Learning at Groupon, working on search and
recommender systems for local e-commerce. Previously, he
was Director at Verizon and CTO of Lending Club and
AdBrite. He also worked at Yahoo! and Oracle
@joaquind
5. Four Distinct Customer Journeys
Search Browse Home Feed
Similar Deals to Consider
Examples of use of DL @ Groupon
6. GROUPON, INC. PRIVILEGED AND CONFIDENTIAL – DO NOT DISTRIBUTE
Search
Using Query Understanding to impact the recall of the long tail queries
7. Query Similarity
● TF-IDF - bag of words approach
○ Sparse representation
○ Never consider queries unless they share the same terms
○ Queries that share terms but do not have same meaning are candidates
■ “nail clippers” vs “la clippers”
● Random Walk in bipartite graph of queries and categories
○ no guarantee that similar queries have same search results
● Doc2Vec - get k-closest queries in the embeddings space (PV-DM)
○ Improved recall of the tail queries
○ Better overall precision
○ Examples:
■ “sony playstation” -> “playstation 4”, “ps4”, “psp”
Query_1
Query_2
Cat_1
Cat_m-1
Query_n-1
Query_n
Cat_m
...
9. GROUPON, INC. PRIVILEGED AND CONFIDENTIAL – DO NOT DISTRIBUTE
Browse
Leveraging deal classification to guide the customer through Groupon’s vast catalog
11. Learned Taxonomy
Hyperparameters
● batch size: 64
● epochs: 30
● sequence length: 200 words
● dropout: 0.2
What we tried:
● K-Means on the vector dense representation of the deal description
What worked:
● CNN
● LSTM
12. GROUPON, INC. PRIVILEGED AND CONFIDENTIAL – DO NOT DISTRIBUTE
Home Feed
Understanding how images influence purchases in recommended feed
13. Image Propensity to Purchase
● Question being Asked:
○ Are certain deal images more attractive to
customers than others?
○ Do images influence purchases?
● We use a Convolutional Neural Network (CNN) to train a
model to predict an Image Propensity to Purchase (IPP)
● The target class is a binary purchase/no-purchase label
● We later use the precomputed IPP as a feature in our
proprietary learning-to-rank algorithm in the Home Feed
and other places
It is well known that a picture is worth a thousand words and, at Groupon, images play a fundamental role in the marketing of deals.
14. GROUPON, INC. PRIVILEGED AND CONFIDENTIAL – DO NOT DISTRIBUTE
Similar Deals to Consider
Recommending Similar deals by Leveraging User Session Information
15. Recommending Similar Deals
● Content-based similarity
○ Never consider deals unless they share the same terms
○ Deals that share terms but do not have same meaning are
candidates
● Collaborative Filtering
○ Sparse representation, requires Matrix Factorization
○ Only captures similarity based on what other users
purchased, without considering context.
● Deal2Vec - get k-nearest neighbors deals in the embeddings space
○ Build deal embeddings using customer sessions that resulted
in purchases
○ This considers context: customers’ journeys
○ Beyond co-purchases, deal2vec has proven to be an
important source of candidates for deal similarity
○ Can be used in several customer touch-points
■ When browsing: Similar deals to consider
■ Post-purchase: Customers who bought X have also
bought Y
16. Conclusions
● All-in in replacing traditional feature engineering with respective embeddings representation
● Expanding Deep Learning reach within the Groupon to other areas (ie mobile - credit card detection)
● More work in automating feature discovery and model parameter tuning
17. References
1. Comparative Study of CNN and RNN for Natural Language Processing, Wenpeng Yin, Katharina
Kann, Mo Yu and Hinrich Schutze, IBM 2017
2. Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising, Yahoo! 2016
3. The Evolution of a Real-World Recommender System, Pinterest 2016
4. Deep Neural Networks for YouTube Recommendations, ACM 2016
5. Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for
Recommender Systems. Computer 42, 8 (August 2009), 30-37.
6. Yann LeCun, Patrick Haffner, Léon Bottou, and Yoshua Bengio. 1999. Object Recognition with
Gradient-Based Learning. In Shape, Contour and Grouping in Computer Vision, David A.
Forsyth, Joseph L. Mundy, Vito Di Gesù, and Roberto Cipolla (Eds.). Springer-Verlag, London, UK,
UK, 319-.
Collection of techniques that we have applied in large scale and sophisticated recommendation system
Showcase what we have tried, lessons learned and evolution of the thinking
We serving inventory of over 1m merchants to 50 mil customers.
Business organized in verticals: local, goods, tickets, getaways just to name few
Pumped $20b into local business
The group we work is working on building daily habit.
Vanila examples do not work in production
Steep learning curve from basic examples to real production example in ranking
Steep learning curve from vanila examples to real production applications
convolutions work well on 1D vectors (text) as well as on 2D vector representation (images). Instead of edges, curves and diagonals, 1D convolutions detect n-grams. Infact Conv nets can be used on any data with spacial patterns
feature engineering can get you only far to certain point. as much we covered first level features and used age, gender, location or calculates features like propensity to category or to travel. there are latent features that we can’t simply do not have access to. there is when embeddings kick in. still getting from word2vec to sequence model application in recommendation systems is not trivial. this talk has task of bridging that gap and showcasing set of applications we used that cover 4 main areas that recommender system at groupon care about: search, browse, homefeed and post purchase.
Make sure that performance of the model does not deteriorate with subsequence model release.
Need to have robust validation.
Intrinsic validation requires curated list of analogies
1-D Convolutions on text vs 2-D Convolutions on the images
Taxonomies are super important for recall.
Taxonomies have 1500 nodes.
Complexity of having multiple taxonomies
New partnerships we have require incorporating new orthogonal taxonomies (ie food delivery restaurants, place has wifi)
Problem of bad taxonomy mapping can have big impact on the business.
We do need safety measures in order to suppress human errors. Human errors: missed opportunity and/or misclassified deal.
Corpus considered title, short description, highlight and fineprint
Why not CNNs? Accuracy of the models vary on the length of the input sequence - CNNs focus on small regions in order to extract important classification features, while LSTMs consider the whole input sequence
Embedding is calculated by taking the output of the LSTM cell at last time step and multiplying it with another weight matrix, normalizing it and put through softmax classifier that had 1500 classes.
Trying whole range of the hyperparameters, but we settled with batch size 64 and sequence length 200 and dropout 0.2
Scoring time use target (category) embedding in order to find cosine similarity with vector representation of the input
We use a Convolutional Neural Network (CNN) to train a model using deal views <image, target> to predict an image’s propensity to purchase (IPP)
The target class is a binary purchase/no-purchase label based on the customer’s decision made after viewing the deal
We use the precomputed IPP as a feature in our proprietary learning-to-rank algorithm used for ranking deals in the Home Feed