37
Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova March 23th, 2016 Andreas Lommatzsch - TU Berlin, Berlin, Germany Jonas Seiler - plista, Berlin, Germany Daniel Kohlsdorf - XING, Hamburg, Germany CrowdRec - www.crowdrec.eu

Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Get on with it!

Recommender system industry

challenges move towards real-world,

online evaluation

Padova – March 23th, 2016

Andreas Lommatzsch - TU Berlin, Berlin, Germany

Jonas Seiler - plista, Berlin, Germany

Daniel Kohlsdorf - XING, Hamburg, Germany

CrowdRec - www.crowdrec.eu

Page 2: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

• Andreas

Andreas Lommatzsch

[email protected]

http://www.dai-lab.de

Page 3: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

• s

Jonas Seiler

[email protected]

http://www.plista.com

Page 4: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

• Daniel

Daniel Kohlsdorf

[email protected]

http://www.xing.com

Page 5: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Where are recommender

system challenges headed?

Direction 1:

Use info beyond the user-

item matrix.

Direction 2:

Online evaluation +

multiple metrics.

Moving towards real-world evaluation

Flickr credit: rodneycampbell

Page 6: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Why evaluate?

<Images showing “our” use cases>

● plista

● XING

● Improve results algorithms

● handle technical constraints

● User Satisfaction

• Evaluation is crucial for the success of real-life systems

• How should we evaluate?

● Improve user satisfaction

● Increase sales, earnings

● Optimize the technical platform for providing the

service

Precision and

Recall

Technical

complexity

Influence

on sales

Required hardware

resources

Business

models

Scalability

Diversity of the

presented results

User

satisfaction

Page 7: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Evaluation Settings

• A static collection of documents

• A set of queries

• A list of relevant documents defined by

experts for each query

Traditional Evaluation in IR

The Cranfield paradigm was designed in the early 1960s when

information access was via Boolean queries against manually indexed

documents and there was (virtually) no text online. Cyril Cleverdon,

Librarian of the College of Aeronautics, Cranfield, England, built a test

collection that modeled university researchers, including abstracts of

aeronautical papers, one-line queries based on questions gathered

from the researchers, and complete relevance judgments for each

query submitted by these users. The idea of carefully modeling some

user application continued with Prof. Gerard Salton and the SMART

collections, such as searching MEDLINE abstracts using real questions

submitted to MEDLINE, or searching full text TIME articles with real

questions from several sources, etc. A 1969 paper by Michael Lesk

and Salton used experiments on the ISPRA collection to show that

relevance judgments made by a person who was not the user would

still allow valid system comparison, a precursor to the paper by Ellen

Voorhees in SIGIR 1998.

IR based on static

collections

A set of queries. For each

query there is a list of

relevant documents

defined by experts

Reproducible setting

All researches have

exactly the same

information

“The Cranfield paradigm”

Advantages

• Reproducible setting

• All researches have exactly the same

information

• Optimized for measuring precision

Query0

* #nn

* #nn

* #nn

Page 8: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Traditional Evaluation in IR

Weaknesses of traditional IR evaluation

• High costs for creating dataset

• Datasets are not up-to-date

• Domain-specific documents

• The expert-defined ground truth does not

consider individual user preferences

• Individual user preferences

• Context-awareness is not considered

• Technical aspects are ignored

Context is

everything

Page 9: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Industry and recsys challenges

• Challenges benefit both industry and academic research.

• We look at how industry challenges have evolved since

the Netflix prize 2009.

Page 10: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Traditional Evaluation in RecSys

Rating prediction

Cross-validation

Individual User prefences /

personalization

Large dataset

sparcity

Evaluation Settings

• Rating prediction on user-item matrices

• Large, sparse dataset

• Predict personalized ratings

• Cross-validation, RMSE

Advantages

• Reproducible setting

• Personalization

• Dataset is based on

real user ratings

“The Netflix paradigm”

Page 11: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Traditional Evaluation in RecSys

Weaknesses of traditional Recommender evaluation

• Static data

• Only one type of data - only user ratings

• User ratings are noisy

• Temporal aspects tend to be ignored

• Context-awareness is not considered

• Technical aspects are ignored

Static data

Context is not taken into account

Crossvalidation does not match real-life settings

Why Netflix did not implement the winner https://www.techdirt.com/blog/innovation/articles/20120409/03

412518422/why-netflix-never-implemented-algorithm-that-

won-netflix-1-million-challenge.shtml

Page 12: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Challenges of Developing Applications

Challenges

• Data streams - continuous changes

• Big data

• Combine knowledge from different sources

• Context-Awareness

• Users expect personally relevant results

• Heterogeneous devices

• Technical complexity, real-time requirements

Page 13: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

How to address these challenges in the Evaluation?

• Realistic evaluation setting

– Heterogeneous data sources

– Streams

– Dynamic user feedback

• Appropriate metrics

– Precision and User satisfaction

– Technical complexity

– Sales and Business models

• Online and Offline Evaluation

How to Setup a better Evaluation?

● Online Evaluation

● Consider the context

● Data streams

● Business model-oriented metrics

Page 14: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Approaches for a better Evaluation

• News recommendations

@ plista

• Job recommendations

@ XING

Page 15: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

The plista Recommendation Scenario

Setting

● 250 ms response time

● 350 Mio AI/day

● In 10 Countries

Challenges

● News change

continuously

● User do not log-in

explicitly

● Seasonality, context-

depend user

preferences

Page 16: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Offline

• Cross-validation

– Metric Optimization Engine

(https://github.com/Yelp/MOE)

– Integration into Spark

• How well does it correlate with

Online Evaluation?

• Time Complexity

Evaluation @ plista

Online

• AB Tests

– Limited

• by Caching Memory

• Computational

Resources

– MOE*

Page 17: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Offline

• Mean and variance estimation of parameter space with

Gaussian Process

• Evaluate parameter with highest Expected Improvement (EI),

Upper Confidence Interval ….

• Rest API

Evaluation using MOE

Page 18: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Online

• A/B Tests are expensive

• Model non-stationarity

• Integrate out non-stationarity

to get mean EI

Evaluation using MOE

Page 19: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Provide an API enabling researchers testing own ideas

• The CLEF-NewsREEL challenge

• A Challenge in CLEF (Conferences and Labs of the Evaluation Forum)

• 2 Tasks: Online and Offline Evaluation

The CLEF-NewsREEL challenge

Page 20: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

How does the challenge work?

• Live streams consisting of impressions, requests, and

clicks, 5 publishers, approx 6 Million messages per day

• Technical requirements: 100 ms per request

• Live evaluation

based on CTR

CLEF-NewsREEL

Online Task

Page 21: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Online vs. Offline Evaluation

• Technical aspects can be evaluated without user feedback

• Analyze the required resources and the response time

• Simulate the online evaluation by replaying a recorded

stream

CLEF-NewsREEL

Offline Task

Page 22: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Challenge

• Realistic simulation of streams

• Reproducible setup of computing environments

Solution

• A framework simplifying

the setup of the evaluation

environment

• The Idomaar framework developed in the CrowdRec project

CLEF-NewsREEL

Offline Task

http://rf.crowdrec.eu

Page 23: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

More Information

• SIGIR forum Dec 2015 (Vol 49, #2)

http://sigir.org/files/forum/2015D/p129.pdf

Evaluate your algorithm online and offline in NewsREEL

• Register for the challenge!

http://crowdrec.eu/2015/11/clef-newsreel-2016/

(register until 22nd of April)

• Tutorials and Templates are provided at orp.plista.com

CLEF-NewsREEL

Page 24: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

https://recsys.xing.com/

XING - RecSys Challenge

Page 25: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Job Recommendations @ XING

Page 26: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

XING - Evaluation based on interaction

● On Xing users can give feedback on recommendations.

● Number of user feedback way lower than implicit measures.

● A/B Tests focus on clickthrough rate.

Page 27: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

XING - RecSys Challenge, Scoring,

Space on Page

● Predict 30 items for each user.

● Score: weighted combination of

the precision

○ precisionAt(2)

○ precisionAt(4)

○ precisionAt(6)

○ precisionAt(20)

Top 6

Page 28: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

XING - RecSys Challenge, User Data

• User ID

• Job Title

• Educational Degree

• Field of Study

• Location

Page 29: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

XING - RecSys Challenge, User Data

• Number of past jobs

• Years of Experience

• Current career level

• Current discipline

• Current industry

Page 30: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

XING - RecSys Challenge, Item Data

• Job title

• Desired career level

• Desired discipline

• Desired industry

Page 31: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

XING - RecSys Challenge, Interaction Data

• Timestamp

• User

• Job

• Type:

– Deletion

– Click

– Bookmark

Page 32: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

XING - RecSys Challenge, Anonymization

Page 33: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

XING - RecSys Challenge, Anonymization

Page 34: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

XING - RecSys Challenge, Future

• Live Challenge

– Users submit predicted future interactions

– The solution is recommended on the platform

– Participants get points for actual user clicks

Release to Challenge Collect Clicks

Work On Predictions

Score

Page 35: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

How to setup a better Evaluation

• Consider different quality criteria

(prediction, technical, business models)

• Aggregate heterogeneous information sources

• Consider user feedback

• Use online and offline analyses

to understand users and their

requirements

Concluding ...

Page 36: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

Participate in challenges based on real-life scenarios

• NewsREEL challenge

Concluding ...

• RecSys 2016 challenge

=> Organize a challenge. Focus on real-life data.

http://orp.plista.com

http://2016.recsyschallenge.com/

Page 37: Get on with it! - DAI-Labor · Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 23th, 2016 Andreas Lommatzsch - TU

More Information

• http://www.crowdrec.eu

• http://www.clef-newsreel.org

• http://orp.plista.com

• http://2016.recsyschallenge.com

• http://www.xing.com

Thank You