30
/ 30 Antoine Cornuéjols AgroParisTech – INRA MIA 518 [email protected] Apprentissage et Alimentation : l’apport du numérique Thèse Sema Akkoyunlu Et projet ANR « SHIFT » (2019-2022)

Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

AntoineCornuéjols

AgroParisTech–INRAMIA518

[email protected]

ApprentissageetAlimentation:l’apportdunumérique

ThèseSemaAkkoyunluEtprojetANR«SHIFT»(2019-2022)

Page 2: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Oùl’onparled’«avalanche»dedonnées

•  DesdonnéescapturéesàfoisonquandnousallonssurInternet–  Surquelssites

–  Combiendetemps,lesclics,lesdurées,lesachats,…

•  Smartphones

–  Localisationmêmesionaditnon

–  Destasd’appliespleinesdecuriosité

•  Braceletsconnectées

•  Moyensdepaiement(banques)

•  Capteursdanslesvéhicules(assurances)

•  CompteursLinky

•  «Smart»cities

222/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Page 3: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Touslesdomainessonttouchés...

•  Labio-informatique

322/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

•  Lasociologie

•  Lae-medecine(leme-data)

•  Ledomainejuridique

•  L’industrie

•  L’assurance

Page 4: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Touslesdomainessonttouchés... même

•  L’agriculturenumérique

422/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

o  Grosvolumededonnées

u  Capteurs

u  Drones

u  Réseauxsociauxetpro

Page 5: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Tous… sauf…

•  L’alimentation

–  EnquêteNutrinet•  ~277000internautesthéoriquementsurdesannées•  Mais

u  à80%desfemmes

u  Milieuxsocio-professionnelsélevés

u  Abandonnentaprèsquelquesjours

•  L’éducation–  Peudedonnéessurcequisepasseenclasseoudevantunécran

522/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Manquededonnéesreprésentatives

Page 6: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Quellesdonnées

surl’alimentation

22/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols) 6

Page 7: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

BasesdedonnéesINCA

ÉtudeIndividuelleNationaledesConsommationsAlimentaires–  Photographiedeshabitudesdeconsommationsalimentairesdela

populationenFrancemétropolitaine

–  Étudesréaliséestousles7ans•  INCA1(1998-1999),INCA2(2006-2007)etINCA3(2014-2015)

–  Typesdedescripteurs•  Caractéristiquesdesindividus•  Critèresdechoixdesaliments•  Attitudesetopinionsenalimentation•  Habitudesdevie/étatdesanté•  Apportsnutritionnelsjournaliersen38nutriments•  Consommationsalimentairesdétailléesetquantifiéessurlasemainedesindividus

•  Descriptifdesoccasionsdeconsommation:lieu,durée,…

722/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Page 8: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

BasesdedonnéesINCAetANSES

INCA2

–  4079individus

–  541526lignes

–  Tablerelativesauxdonnéessurlesménages

–  …

Basesdedonnéesdel’ANSESsurlacompositiondesaliments(apportsensubstancesbénéfiques)

822/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Page 9: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Donnéesd’entreprisesprivées

Ustensilesdecuisineconnectés

«Astuces,conseilsourecettesoriginales,nosapplicationsnourrissentvotreinspirationetvousaccompagnentdanslapréparationdeplatssimplesetsavoureux»

922/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

•  Avantages

–  Recueilautomatique

•  Limites

–  Nerecueillentpastouteslesinformationsvoulues

–  Développement«agile»=>donnéesnoncohérentes

Page 10: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Donnéesd’entreprisesprivées

Chaînesderestauration

–  Lesusagersquil’acceptent

–  Peuventavoirleurconsommationenregistrée

1022/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

•  Avantages

–  Enregistrementautomatiqueetsystématique

•  Limites

–  Encontextespécifique•  Seulementàmidi•  Choixlimités

–  Populationsbiaisées

–  Développement«agile»=>donnéesnoncohérentes

Page 11: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Lesdonnéesidéales

Donnéessurleshistoriquesdeconsommationd’un(grand)ensembled’individus

–  Mieuxsionpeutsuivrelecomportementdechaqueindividu

Onaimeraitpouvoirfaireduclusteringpouridentifierdesclassesdeconsommateursensebasantsur:

•  Age•  Tailledelafamille•  Éducation•  …

–  Mieuxsileshistoriquessontlongs(plusieurssemaines/mois)

–  Mieuxsilesdétailsdesrepassontconnus

1122/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Page 12: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Bilansurlerecueildedonnées

1.   Difficiledemesurerlaconsommationindividuelle

–  Pasdesystèmedemesureautomatique

–  Compliqué(e.g.reconnaissancedephotosdeplats)

vs.montresconnectées

2.   Difficultéàrecueillirdeshistoriqueslongs–  Coûteuxpourlesindividus

–  Pasd’incitationàlefaire

vs.médecinepersonnalisée

3.  Populationsetcontextesbiaisés

1222/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Page 13: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Versdesrecommandations

personnalisées

22/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols) 13

Page 14: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

ProjetANR«SHIFT»:l’objectif

1.  Identifiercequidétermineleschoixalimentairesdesconsommateurs–  Saisonniers

–  Sociétaux(e.g.repasdefamille,surlelieudetravail,fête,…)

–  Lesrepasprisavantetaprès

2.  Réaliserunsystèmederecommandationpersonnalisé–  Fonctionnanten-ligne

–  Prenantencomptelecontextedesrepas

–  S’appuyantsurlesfacteurscausaux

Parcontrasteaveclesfacteursquisontcorrélésmaisnoncausaux

1422/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Page 15: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

TâchespourleprojetANR«SHIFT»

Task1.1Collectingalargeandcomprehensiveenoughdataset

–  Primarysourceofdata:INCA2(andINCA3ifavailableontime)

•  Butthetemporaldepthisverylimited(afewdays)

–  Anewsourceofdataistobegathered

•  Adatacollectioncampaignwillbeorganizedandperformed

•  Itisexpectedtogatherdataon500youngadultsforperiodsatleast4monthslongoverayear

•  Useofappsonsmartphones(e.g.MyFitnessPal)

1522/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Page 16: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

TâchespourleprojetANR«SHIFT»

Task1.2Preprocessingthedatabaseandcorrectingthebiases

–  Identificationofthesystematicbiases

•  Biasesthatmayberelatedtogender,age,societalandculturalstatus,habits,…

•  Exemplesarealmostentirelypositive

•  Theremightbesystematicmissinginformation(E.g.Sideconsumptionoutsideofthemainmeals)

1622/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Page 17: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Apprentissage prescriptif

•  Apprentissage«prescriptif»(recherchedecausalités)

1.  J’observequelesgensquimangentdesglaces

sontsouventenmaillotdebain

2.  Jevoudraisvendredavantagedeglaces

Jedemandeauxgensdesemettreenmaillotdebain

22/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols) 17

Page 18: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

•  Quellesrecommandationsfaireàunconsommateurpourqu’il

baissesaconsommationd’alimentscarnés?

•  Quelimpactsiondoubleleprixde…?

•  Quelrendementaurais-jeeul’annéedernière

sij’avaisplantédu...aulieude...

1822/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Page 19: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Figure 7 – Graphe de dépendance centré sur la variable faim

4.2.2 Mise en forme des données

Calcul du score protéique : Un score caractérisant l’appétence pour les pro-téines d’une personne a été calculé à partir des réponses aux questions. Pour chaquealiment proposé dans le questionnaire, on a accès grâce aux données de la table CI-QUAL (https://pro.anses.fr/tableciqual/) à la quantité de protéines en g/100gcontenue dans celui-ci. Pour chaque test, l’envie de manger des protéines a été esti-mée en calculant l’écart moyen de quantité de protéine entre le plat choisi et le platrejeté sur les 18 questions posées, les deux dernières questions étant identiques auxdeux premières. Les plats étant proposés aléatoirement à la comparaison, le score aété normalisé par la déviation absolue moyenne sur les 20 questions :

Y =

!ni=1(Qi(choisi)−Qij(rejete))!ni=1 |Qi(choisi)−Qi(rejete)|

avec Qi(choisi)(resp.rejet) la quantité de protéine contenue dans l’aliment choisi(resp. rejeté) à la question i. Le score est donc borné entre -1 et 1.Ce score a été calculé d’une part en prenant en compte toutes les questions posées auxindividus lors du test. D’autre part, les plats contenant des produits animaux étantplus facilement associés par les individus à la présence de protéines, un deuxièmescore a été calculé en ne prenant en compte que les questions pour lesquelles au moinsun plat contenant un ou des produits animaux était proposé dans la comparaison.

Les variables causales doivent être binaires pour qu’on puisse mesurer l’effet cau-sal. Les variables étudiées n’étant pas toutes binaires, elles ont été modifiées de lamanière suivante :

— Quatre questions différentes étaient posées pour caractériser la faim d’unindividu, l’individu notant chaque réponse entre 0 et 100. Un score moyen atout d’abord été calculé à partir des réponses aux questions. Ce score étant

21

Larecherchederelationscausales

Qu’est-cequicausel’appétencepourdesplatsprotéinés?

1922/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

–  Lafaim?

–  L’heuredanslajournée?

–  Legenre?

–  L’aspectvisuel?

–  L’aspectolfactif?

–  Larichesseenprotéinesdesrepasprécédents?

–  …

Page 20: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Uncoachnutritionnel

Étantdonnés:

–  Unindividu(etdesdonnéessurseshabitudesdeconsommation)

–  Desdonnéessurlescomportementsalimentairesd’unensemblereprésentatifd’individus

–  Desconnaissancessurcequidétermineleschoixdeconsommationalimentaire

Suggérerdesopérateurs

–  Quiinduisentdesmodificationsdésirablesd’habitudesdeconsommation

–  Quisontpersonnalisés

–  Entempsréel

2022/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Page 21: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Opérateurs

Suggestionsdesubstitutiond’unplatparunautre

–  Remplacerdesfritesparduriz

2122/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

HealthRecSys 2017, August 2017, Como, Italy Akkoyunlu et al.

Our objective is to mine food pair substitutability applied byconsumers when they compose their meals. Given a database ofmeals, we want to extract substitutability relationships based onthe way people consume food. No nutritional information is usedduring this process. Instead, contextual information is used in orderto extract meaningful substitutability relationships.

2.2 De�ning ContextThe notion of context is quite complex and di�cult to de�ne uni-versally. In the �eld of recommender systems, the context is usuallyde�ned according to the �eld of application of the system.

In the nutrition �eld, we de�ne two types of contexts: the dietarycontext and the food intake context. We de�ne the dietary contextof a food item x as the set of food items c with which x is consumed.For instance, in themeal {co�ee, bread, jam, juice}, the dietary contextof {co�ee} is {bread, jam, juice}. We think that the dietary context isfundamental when seeking substitutability of food items becausethe way people compose their meals is intrinsically dependent onthe relationships between the items.

The food intake context is de�ned as the set of all variablessuch as the type of the meal (breakfast, lunch, dinner, snack), thelocation (home, workplace, restaurant), the participants (family,friend, coworkers, alone). This corresponds to the notion of contextusually used in context-aware recommender systems [2].

There are three paradigms for incorporating context in recom-mender systems : contextual pre-�ltering, contextual post-�lteringand contextual modelling [2]. Contextual pre (post)-�ltering con-sists in splitting the dataset according to contextual variables before(after) applying algorithms. Contextual modelling consists in incor-porating contextual information in the algorithm. In our framework,dietary context is used in order to model substitutability whereasthe food intake context is used for contextual pre-�ltering.

Our objective is to investigate substitutability among food itemsbased on the assumption that two food items are highly substi-tutable if they are consumed in similar dietary contexts and in thesame intake context.

Investigating all possible dietary contexts of a food item is com-putationally expensive because the number of possible dietary con-text is exponential in the number of food items and the length ofthe dietary context. The number of interesting contexts is actuallylimited by the characteristics of the available data. Instead of in-vestigating all the dietary contexts of a food item, we decided toexplore collections of meals that di�er only by one item. We de�nethe dietary context of a meal database, or meal context c as theintersection of a set of meals Sm such that :

len(c) = maxx 2Sm

(len(x)) � 1 (1)

Let us de�ne the substitutable set Sc associated to a meal contextc as the set of food items such that the context c plus one item of Sccan be e�ectively consumed together. For instance, the substitutableset of the meal context c = {bread, jam, juice} might be Sc ={co f f ee, tea,�o�urt}.

2.3 Mining substitutable itemsTo e�ciently retrieve interesting sets of meal contexts and theirsubstitutable set, in this paper, we propose an approach based on

graph mining techniques. Let us denote the meal graph G = (V ,E)where V is the set of nodes representing meals from the databaseand E is the set of edges such that two nodes are connected if thereis at most one item that changes between them. A meal shouldappear at least once in the database in order to appear as a node inthe graph. Figure 1 is a simple illustration of a meal network.

Figure 1: Example of a simple meal network

Designed in this way, the nodes of the substitutable set of a mealcontext are adjacent. They form a sub-graph that is completelyconnected. Such an object is called a clique in graph mining. Morespeci�cally, the nodes form a maximal clique. A maximal cliqueis a clique to which another node cannot be added. In our setting,discovering substitutable sets is similar to mining maximal cliquesin a graph. In this paper we use the algorithm of Bron-Kerbosh [4]to search for maximal cliques.

All discovered maximal cliques are not cliques that are inter-esting for our study. We want cliques such that the size of theintersection of the nodes is a meal context as de�ned above. Wedenote these cliques as substitutable cliques. However, we mayencounter cliques as in Figure 2. In this case, the intersection ofthe nodes is {A} and we cannot derive a substitutable set from thisclique.

ABD

ABC

AED

Figure 2: Example of an uninteresting clique

To avoid retrieving uninteresting cliques, we apply Algorithm 1that �lters out substitutable cliques.

For instance, when we apply our algorithm to the example ofFigure 1, we get that this graph is a maximal clique and a substi-tutable clique more particularly. The context is {bread, butter} andthe substitutable set associated to this context is {co�ee, tea, milk,jam, nothing}. In this particular case, it is possible to substitute anitem by nothing because {bread,butter } can be consumed as such.

2.4 Computing a substitutability scoreSubstitutability is not a binary relationship because there are dif-ferent degrees of substitutability. If two items are consumed to-gether, they are less substitutable because they might be associated.

HealthRecSys 2017, August 2017, Como, Italy Akkoyunlu et al.

Breakfast and lunch Breakfast Lunch

Food Item Substitute item(ordered by score) Score Substitute item

(ordered by score) Score Substitute item(ordered by score) Score

BreadRuskViennoiserieCakes

0.22340.13590.0745

RuskViennoiserieCakes

0.37160.20100.1243

FruitsYogurtPotatoes

0.04970.04900.0468

Co�eeTeaCocoaChicory

0.27990.17290.1486

TeaChicoryCocoa

0.42190.25500.2255

SodasYogurtFruits

0.0650.06420.0633

TeaCo�eeCocoaChicory

0.27990.17210.1289

Co�eeChicoryCocoa

0.42190.19650.1462

CakesViennoiserieCo�ee

0.05360.04170.0412

CocoaChicoryCo�eeTea

0.21710.17290.1289

ChicoryCo�eeTea

0.22110.20770.1965

Cereal barsPreprocessed vegetablesHamburgers

0.250.05260.0256

ButterMargarineHoney/jamChocolate spread

0.24130.09240.0786

MargarineChocolate spreadHoney/jam

0.40300.12400.1175

MargarineFruitsSauces

0.06020.04310.0431

MilkJuiceYogurtSugar

0.14090.12640.1089

YogurtJuiceTap water

0.18150.15040.1361

DoughnutOther milkMilk in powder

0.08690.06660.0625

WineSodasBeerTap water

0.08140.07040.0412

/ /SodasTap waterBeer

0.08600.07550.0746

PizzaSandwich baguetteOther sandwichesMeals with pasta or potatoes

0.24290.17290.1513

/ /Sandwiches baguetteOther sandwichesMeal with pasta or potatoes

0.28100.21770.1658

PotatoesPastaGreen beansRice

0.11110.09220.0602

/ /PastaGreen beansRice

0.11420.09410.0616

Table 2: Top 3 substitutable items for several items for breakfast and lunch

for lunch, it can be substituted by sodas, yogurt and fruits. Fooditems are consumed di�erently according to the type of meal. Therelationship of substitutability is therefore di�erent too.

Di�erence of scale in scores is noted according to the type ofmeal. It may be due to the fact that the diversity of food itemsconsumed during lunch is higher than during breakfast. A rescalingfactor based on the diversity of the type of meal can be introduced.

The frequency of meals is not taken into account in the com-putation of the score. Atypical eating habits can impact the score.Considering the frequency would mitigate this problem. As futurework we plan to investigate this aspect and to consider di�erentcontextual variables such as location and commensals.

5 ACKNOWLEDGEMENTThis study was funded by Danone Nutricia Research.

REFERENCES[1] A�����������, P., ��� W����, I. Extracting food substitutes from food diary

via distributional similarity. CoRR abs/1607.08807 (2016).[2] A����������, G., ��� T�������, A. Context-aware recommender systems. In

Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor,Eds. Springer US, 2011, pp. 217–253.

[3] B��������, L., ��� R����, F. Context-based splitting of item ratings in collab-orative �ltering. In Proceedings of the Third ACM Conference on Recommender

Systems (New York, NY, USA, 2009), RecSys ’09, ACM, pp. 245–248.[4] B���, C., ��� K�������, J. Algorithm 457: Finding all cliques of an undirected

graph. Commun. ACM 16, 9 (Sept. 1973), 575–577.[5] F�����, J., ��� B��������, S. Intelligent food planning: personalized recipe

recommendation. In Proceedings of the 15th International Conference on IntelligentUser Interfaces, IUI 2010, Hong Kong, China, February 7-10, 2010 (2010), pp. 321–324.

[6] F�����, J., B��������, S., ��� S����, G. Recipe recommendation: Accuracy andreasoning. In User Modeling, Adaption and Personalization - 19th InternationalConference, UMAP 2011, Girona, Spain, July 11-15, 2011. Proceedings (2011), pp. 99–110.

[7] G�, M., E����, M., F����������T�����, I., R����, F., ��� M������, D. Usingtags and latent factors in a food recommender system. In Proceedings of the 5thInternational Conference on Digital Health 2015 (New York, NY, USA, 2015), DH’15, ACM, pp. 105–112.

[8] H�����, M., L�����, B., ��� E��������, D. You are what you eat: Learninguser tastes for rating prediction. In String Processing and Information Retrieval- 20th International Symposium, SPIRE 2013, Jerusalem, Israel, October 7-9, 2013,Proceedings (2013), pp. 153–164.

[9] J������, P. The distribution of the �ora in the alpine zone.1. New Phytologist 11,2 (1912), 37–50.

[10] M�A����, J. J., P�����, R., ��� L�������, J. Inferring networks of substitutableand complementary products. In Proceedings of the 21th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Mining, Sydney, NSW,Australia, August 10-13, 2015 (2015), pp. 785–794.

[11] Z����, J., W�, X., N��, J., ��� B������, A. Substitutes or complements: an-other step forward in recommendations. In Proceedings 10th ACM Conference onElectronic Commerce (EC-2009), Stanford, California, USA, July 6–10, 2009 (2009),pp. 139–146.

Scoredesubstituabilité:

Investigating substitutability of food items in consumption data HealthRecSys 2017, August 2017, Como, Italy

Algorithm 1 Find substitutable clique

function ��S�������(clique)context = getContext(clique)lenmax =max(len(x) for x in clique)if lenmax - len(context) = 1 then

return Trueelse

return False

Therefore, we need a function to quantify the relationship of sub-stitutability that incorporates the possibility of associativity. Ourhypothesis is that two items are highly substitutable if they areconsumed in similar dietary contexts.

We want to compute a substitutability score such as :(1) Two items are highly substitutable if they are consumed in

similar contexts.(2) Two items are less substitutable if they are consumed to-

gether.(3) Substitutability is a symmetrical relationship.Let us denote, for an item x , the context setCx as the set of meal

contexts in which x is a substitutable item. If the cardinality ofCx denoted as |Cx | is high, then x is substitutable in many mealcontexts.

For two items x and �, the condition (1) is described by theintersection of Cx and C� . If |Cx \ C� | is high, then x and � areconsumed in similar contexts.

We denote Ax :� the set of contexts of x where � appears :

Ax :� = {c 2 Cx |� 2 c} (2)

The cardinality of Ax :� denotes how � is associated to x .Taking into account these considerations, we propose the sub-

stitutability score inspired by the Jaccard index [9]:

f (x ,�) =|Cx \C� |

|Cx [C� | + |Ax :� | + |A� :x |(3)

The score equals 1 when x and� appear in exactly the same contextsand Ax :� = A� :x = ;. If x and � are never consumed in the samecontext then the score equals 0. The higher |Ax :� | + |A� :x | is, thehigher the association of x and � is and the lower the score is.

3 EXPERIMENTS3.1 The INCA 2 databaseThe French dataset INCA 2 1 is the result of a survey conductedduring 2006-2007 about individual food consumption. Individual7-day food diaries are reported for 2624 adults and 1455 childrenover several months taking into account possible seasonality ineating habits. A day is composed of three main meals : breakfast,lunch and dinner. The moments in between are denoted as snacking.For the main meals, the location (home, work, school, outdoor) andthe companion (family, friends, coworkers, alone) are registered.

The 1280 food entries are organized in 44 groups and 110 sub-groups of food items. We chose to consider the medium level of

1 : https://www.data.gouv.fr/fr/datasets/donnees-de-consommations-et-habitudes-alimentaires-de-letude-inca-2-3/

hierarchy in order to capture substitution relationships inter-groupsand intra-groups.

Only adults are considered in this paper. All meals are gatheredin a meal database DBmeals regardless of the type of meal. Thedatabase can be split according to contextual information in orderto get better results [3]. We compare the results of our methodologyon three datasets : DBbreakf astlunch , DBbreakf ast and DBlunch .

3.2 ResultsApplying our algorithm onDBbreakf ast yields 2368 contexts. Someof these and their substitutable sets are given in Table 1. Our resultsare coherent. For example, either bread, rusk or viennoiserie canbe consumed for breakfast with co�ee, sugar and water.

Context Substitutable set

co�ee, sugar, water, butterbreadruskviennoiserie

tea/infusions, donuts

yogurtsugarjam/honeynothing

Table 1: Results of context and substitutable set retrieval forbreakfasts

We applied our algorithm to the three datasets. The results arereported Table 2. We can see that we can obtain inter-group sub-stitutions such as {potatoes ) green beans} but also intra-groupsubstitutions as {bread) rusk}.

The substitutions proposed are consistent with regards to eatinghabits. Substitutes of drinks are also drinks : the substitutes ofco�ee are tea, cocoa and chicory. The semantic information about afood item being a drink is not encoded in the data and yet takinginto account the dietary context is enough in order to retrievea substitution rule such as "substitute a drink by a drink". Moresurprisingly, we can also retrieve the rule "substitute a spreadableitem by another one" in the case of the substitutes of butter forbreakfast. No semantic information describing how a food item canbe eaten is available in the dataset and yet considering the dietarycontext helps us retrieving this kind of information.

Substitutions between food items of the same nutritional foodgroups are found. For instance, the substitutes for potatoes are pastaand rice. They all contain starches. The nutritional information isnot used during the mining process. This shows that people canvary the source of carbohydrates.

4 DISCUSSION AND CONCLUSIONSWe proposed a score of substitutability based on consumption datawith the assumption that two items are substitutable if they areconsumed in similar contexts. Preliminary results on the INCA 2dataset show that this assumption helps us retrieve substitutabilityrelationships based on eating habits.

When we split the dataset according to the contextual variable"type of meal", the substitutes and the scores are di�erent. Co�eecan be substituted by tea, chicory and co�ee for breakfast whereas

Page 22: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Unmodèle

22

Recommender P2

Context ck

Consumer P1

m1 .… mi …. mM

s1 .… sj …. sN

Consumer P1

Accepts Refuses

Context ck’

Consumer P1

P1changingbehavior

P2adap2ngtoconsumer

22/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Page 23: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Amodel

•  Two“players”–  P1:theconsumer

–  P2:therecommendingsystem

•  P2actsasa“coach”toP1–  SuggestschangestoP1’schoiceswhenappropriate

–  AdaptstothespecificitiesofP1(probabilityoffollowingsuggestions)

•  P1mayfollowornotthesuggestionsofP2

–  Ifyes:modifies(slightly)itsrulesforfuturechoices

2322/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Page 24: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Lesobjectifs

•  But:–  QueP1atteigneleplushautscorePandietpossible

étantdonnéessescaractéristiques

–  Aussirapidementquepossible

•  Metabut

–  Trouverlapolitiqueoptimalepourlecoach(P2)

•  Personnalisée

•  Sensibleaucontexte

2422/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Page 25: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Amodel:learningphase

•  P1:speedofconvergencetooptimizedPandietscore

25

Recommender P2

Context ck

Consumer P1

m1 .… mi …. mM

s1 .… sj …. sN

Consumer P1

Accepts Refuses

Context ck’

Consumer P1

P1changingbehavior

P2adap2ngtoconsumer

22/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

•  P2:adaptabilitytoplayerP1andtocontext

Performanceoflearningdependsonsimilarities

–  betweencontexts

–  betweenconsumers

Soastogeneralizefromspecificexperiences

Page 26: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Amodel:learningphase

•  P1selectsmiaccordingto

26

Recommender P2

Context ck

Consumer P1

m1 .… mi …. mM

s1 .… sj …. sN

Consumer P1

Accepts Refuses

Context ck’

Consumer P1

P1changingbehavior

P2adap2ngtoconsumer

22/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

mi = ArgMaxm

⇥Q(m, ck) + noise

•  P2selectssjaccordingto

sj = ArgMaxs

score(s, ck)

score(s, ck) = Pandiet gain⇥sj(mi)�mi

⇥ p�sj(mi) accepted by P1 in context ck

Estimatedonline

•  P1selects‘accepts’accordingtop�sj(mi) | ck

Q�sj(mi), ck

� Q

�sj(mi), ck

�+ "

p�sj(mi) | ck

�=

sj(mi) accepted in ck1 + # tests

Page 27: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Unproblèmedeplanification(enIA)

•  Lesopérateursdechangementpeuventseulementêtresuggérés

–  Ilfautqu’ilssoientacceptables

•  Leurseffetsprennentdutempspourêtremesurables

•  Lesopérateurspeuventinteragirdemanièresdiverses

–  E.g.Changerlecontextedurepaspeutrendreunesubstitutiondeplatimpossible

2722/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Mais:

Page 28: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Commentmesurerl’efficacitédusystèmederecommandation?

2822/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Page 29: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Spécificités

•  Onveutmodifierdeshabitudes

–  Renouvellementsansfindeschoix.

–  Dansdescontextesrépétés

•  Modificationgraduelledelafonctiondedécisionduconsommateur

–  Accepterdes«reculs»parfois:stratégiederecommandationnonmyope

–  Suggestionsacceptables...etsuffisammentvariées

•  Nonindépendancedeschoixprochesdansletemps

2922/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)

Page 30: Apprentissage et Alimentation : l’apport du numérique · • A data collection campaign will be organized and performed • It is expected to gather data on 500 young adults for

/30

Conclusions

1.  Lerecueildedonnéesenalimentationestundéfiquiresteàrelever

2.  Larecommandationpersonnaliséeenalimentation

–  Estunproblèmeintéressant

•  Voiraussileconceptdenudging

–  Demandantdesdéveloppementsméthodologiques

–  Avecunfortimpactpotentiel

•  Surlasantépublique•  Surladurabilitédessystèmesdeproductions

3022/05/2019«ApprentissageautomatiqueetAlimentation»(A.Cornuéjols)