Author
yandex
View
421
Download
5
Embed Size (px)
DESCRIPTION
The ever-growing amount of data available on the Internet calls for personalization. Yet, the most effective personalization schemes, such as those based on collaborative filtering (CF), are notoriously resource greedy. We argue that scalable infrastructures should rely on P2P design to scale to that increasing number of users, data and dynamics. In this talk, I will present a scalable algorithm for collaborative filtering, which P2P flavor provides scalability by design. This scheme can been instanciated in various settings: P2P, hybrid infrastructure offloading CPU-intensive recommendation tasks to front-end client browsers, while retaining storage and orchestration tasks within back-end servers, as well as centralized infrastructures. As personalization relies on users giving away information, I will also present the potential encountered privacy issues and a range of solutions to preserve users privacy in such systems.
Scalable infrastructures for personalization
Anne-Marie Kermarrec
A.-M. Kermarrec (Inria)
Inria, France: 8 research centers, 150 research teams
June 2014
A.-M. Kermarrec (Inria) June 2014
A.-M. Kermarrec (Inria)
A cry for personalization
June 2014
A.-M. Kermarrec (Inria)
Why is personalization so difficult?
• Huge volume of data: small portion of interest
• Dynamic interests
• Interesting stuff does not come always from friends
• Classical notification systems do not filter enough or too much
Scalable personalization infrastructures
June 2014
A.-M. Kermarrec (Inria)
KNN computation over large data
Basic building block for many applications
• Similarity search
• Machine learning
• Data mining
• Image processing
• Collaborative filtering
June 2014
A.-M. Kermarrec (Inria)
KNN-based user-centric collaborative filtering
Provide each user with her k closest neighbors
(Users owns a profile, the system has its favorite similarity metric)
Use this topology for
• personalized notifications
• recommendation
AliceBob
Carl
Dave
Ellie
June 2014
A.-M. Kermarrec (Inria)
Dealing with truly big data
Want to scale? Think P2P
June 2014
A.-M. Kermarrec (Inria)
Do not look exhaustively
June 2014
A.-M. Kermarrec (Inria)
The key to scalability in KNN graph construction
Consider a partial set of candidates
Sampling-based approach
June 2014
A.-M. Kermarrec (Inria)
P2P KNN graph construction
Which nodes are close?
How to discover them?
Similarity metric
Sampling
June 2014
A.-M. Kermarrec (Inria)
Which nodes are close?
Model
U(sers) × I(tems) (items)
Profile(u) = vector of liked/shared/viewed items
Cosine similarity metric Jaccard metric
Minimal information: no tag, no user’s input, generic June 2014
A.-M. Kermarrec (Inria)
Each node maintains a set of neighbors (c entries)
Peer exchange
ShuffleP Q
How to discover them: Gossip-based computing
Result random graph
Highly resilient against churn, partition
Small diameter
[JGKVV, ACM TOCS 2007]
June 2014
A.-M. Kermarrec (Inria)
KNN construction
Similarity computation
exchange ofneighbors lists
neighborhoodoptimization1 2
Alice Bob
Carl
DaveEllie
Frank
June 2014
A.-M. Kermarrec (Inria)
Decentralized KNN selection
[FGKL Middleware 2010]
RPS layer providing random sampling
KPS clustering layer gossip-based topology clustering
Interest-based linkRandom link
AliceBob
Carl
Dave
Ellie
AliceBob
Carl
Dave
Ellie
June 2014
A.-M. Kermarrec (Inria)
Convergence
Cycles
c current neighbors versus the c closest
Biasedsampling
Random sampling
June 2014
A.-M. Kermarrec (Inria)
Applications
- Decentralized news recommendation [BFGJK, IPDPS
2013]
- Top-K [BGKL, ACM TODS 2011] [BGK, ACM TOIT 2014]
- Geo recommendation [BKKT, ICDCS 2012]
June 2014
A.-M. Kermarrec (Inria)
DECENTRALIZED NEWS RECOMMENDER
Notification is taking over
June 2014
A.-M. Kermarrec (Inria)
What’s wrong with news feed
Interest are dynamic
Wrong granularity for filtering of classical notification systems
Small portion of the available information is of interest
Interesting stuff does not come always from friends
An implicit notification system
based on collaborative filtering
June 2014
A.-M. Kermarrec (Inria)
WhatsUp in a nutshell
KNN selection
Dissemination
June 2014
A.-M. Kermarrec (Inria)
Dissemination: orientation and amplification
Orientation: to whom?
Exploit: ForwardTo friends
Explore: Forward to random users
Amplification: to how many?
Increase Fanout(Log(n))
DecreaseFanout(1)
June 2014
A.-M. Kermarrec (Inria)
WhatsUp in action on the survey (480 users)
Precision Recall F1-Score Messages
Gossip (f=4) 0.34 0.99 0.51 2.3 M
Cosine-CF 0.50 0.65 0.57 5,9k
Whatsup (f=10)
0.471 0.83 0.60 2,4k
June 2014
A.-M. Kermarrec (Inria)
Orientation (survey)
News items received through a dislike forward
Number of dislikes
0 1 2 3 4
Fraction of liked news
54% 31% 10% 3% 2%
June 2014
A.-M. Kermarrec (Inria)
WhatsUp versus Pub/Sub
Approach Precision Recall F1-Score
Pub/Sub 0.40 1.0 0.58
WhatsUp 0.47 0.83 0.60
June 2014
A.-M. Kermarrec (Inria)
WhatsUp versus cascading
Approach Precision Recall F1-ScoreCascading 0.57 0.09 0.16WhatsUp 0.56 0.57 0.57
June 2014
A.-M. Kermarrec (Inria)
PRIVACY MATTERS
June 2014
A.-M. Kermarrec (Inria)
Privacy issues
During user clustering• Exchange of profile in clear
During item dissemination• Predictive nature of the protocol
Profile Obfuscation
Randomized dissemination
June 2014
A.-M. Kermarrec (Inria)
Privacy
Obfuscation• Does not reveal the exact profile• Does not reveal the least sensitive information
Randomized dissemination• Flips the opinion with a given probability (pf)
June 2014
A.-M. Kermarrec (Inria)
Structure profiles
Private Profile
Compact profile
In clear: Full information about the interests
Aggregate signatures of liked items
June 2014
A.-M. Kermarrec (Inria)
Structure profiles
Private Profile
Compact profile
Filter profile
Item profile
Obfuscated profile
In clear: Full information about the interests
Aggregate signatures of liked items
Interests of users that like similar items
Least sensitive information about interests
Aggregate interests of users that liked it
June 2014
A.-M. Kermarrec (Inria)
Obfuscation mechanism
News item(received)
Private profile
Profiles kept locally
June 2014
A.-M. Kermarrec (Inria)
Obfuscation mechanism
News item(received)
Private profile Compact profile
News item(forwarded)
+
Profiles kept locally
Profiles exchanged with others
signatureitem
profile
June 2014
A.-M. Kermarrec (Inria)
Obfuscation mechanism
News item(received)
Private Profile Compact Profile Filter Profile
Obfuscated ProfileNews item(forwarded)
x+
Profiles kept locally
Profiles exchanged with others
signatureitem
profileitem
profile
mask ofpopularity
System parameter
June 2014
A.-M. Kermarrec (Inria)
Randomized dissemination
Flips the opinion with a given probability (pf)• Attacker could still learn from the profile• Private profile contains a field with the result of the randomized
decision
Generate Randomized compact profile• Users still use locally their non randomized profile for clustering• Differentially private protocol
June 2014
A.-M. Kermarrec (Inria)
Experimental setup
Simulations and PlanetlabAlternatives• Cleartext profile (CT); • 2DP (DP dissemination and randomized profile for clustering)Metrics• Recommendation: recall/precision• Privacy: Distance between obfuscated profile and real profile; Dataset: Real survey, 120 users on 200 news items (4 instances)
June 2014
A.-M. Kermarrec (Inria)
Impact of randomization
June 2014
A.-M. Kermarrec (Inria)
Impact of randomization
Decrease of precision with increasing pf
June 2014
A.-M. Kermarrec (Inria)
http://131.254.213.98:8080/wup/
Operational prototype
Tested on 500 users @ TrentoRise last year
TRY IT
Take away message
Personalization is needed
Decentralization is healthy
Gossip-based computing is one (the) way to go
June 2014
A.-M. Kermarrec (Inria)
For those who are afraid of P2P
June 2014
A.-M. Kermarrec (Inria)
Hybrid recommendation engine
June 2014
A.-M. Kermarrec (Inria) June 2014
A.-M. Kermarrec (Inria)
HyRec: Taking the best of both worlds
Online KNN selectionRestricted andidate set (k)No data stored at the client
HyRec client: Javascript (widget) running in the browser
June 2014
A.-M. Kermarrec (Inria) June 2014
A.-M. Kermarrec (Inria)
View similarity
June 2014
Dataset Users Items RatingsMovieLens1 943 1700 100,000
MovieLens2 6,040 4000 1,000,000
MovieLens3 69,878 10,000 10,000,000
Digg 59,167 7724 782,807
A.-M. Kermarrec (Inria)
Recommendation quality
June 2014
A.-M. Kermarrec (Inria)
HyRec versus the client load
Impact of HyRec Impact of the client load
Negligible disruption of HyRec 50% load<60ms on smartphone<10ms on laptop
June 2014
A.-M. Kermarrec (Inria)
HyRec versus a centralized recommender
Impact of the request stressImpact of the profile size
June 2014
A.-M. Kermarrec (Inria)
To take away
Personalization is crucial
P2P in a design mindset
Randomization and obfuscation provides a good tradeoff between privacy and quality
June 2014
Thank you