google-rules-of-machine-learning

我要开发同款
匿名用户2021年11月11日
35阅读
开发技术Python
所属分类人工智能、机器学习/深度学习
授权协议MIT License

作品详情

Google's43RulesofMachineLearning

GithubmirrorofM.Zinkevich'sgreat"RulesofMachineLearning"styleguide,withextragoodness.

Youcanfindtheterminologyforthisguideinterminology.md.

Youcanfindtheoverviewforthisguideinoverview.md.

StructureBeforeMachineLearningMLPhase1:YourFirstPipelineMLPhase2:FeatureEngineeringMLPhase3:SlowGrowth,OptimationRefinement,andComplexModelsRelatedWorkAcknowledgements&Appendix

Note:Asterisk(*)footnotesaremyown.NumberedfootnotesareMartin's.

BeforeMachineLearningRule1-Don'tbeafraidtolaunchaproductwithoutmachinelearning.*

Machinelearningiscool,butitrequiresdata.Theoretically,youcantakedatafromadifferentproblemandthentweakthemodelforanewproduct,butthiswilllikelyunderperformbasicheuristics.Ifyouthinkthatmachinelearningwillgiveyoua100%boost,thenaheuristicwillgetyou50%ofthewaythere.Forinstance,ifyouarerankingappsinanappmarketplace,youcouldusetheinstallrateornumberofinstalls.Ifyouaredetectingspam,filteroutpublishersthathavesentspambefore.Don’tbeafraidtousehumaneditingeither.Ifyouneedtorankcontacts,rankthemostrecentlyusedhighest(orevenrankalphabetically).Ifmachinelearningisnotabsolutelyrequiredforyourproduct,don'tuseituntilyouhavedata.

GoogleResearchBlog-The280-Year-OldAlgorithmInsideGoogleTrips

Rule2-First,designandimplementmetrics.

Beforeformalizingwhatyourmachinelearningsystemwilldo,trackasmuchaspossibleinyourcurrentsystem.Dothisforthefollowingreasons:

Itiseasiertogainpermissionfromthesystem’susersearlieron.Ifyouthinkthatsomethingmightbeaconcerninthefuture,itisbettertogethistoricaldatanow.Ifyoudesignyoursystemwithmetricinstrumentationinmind,thingswillgobetterforyouinthefuture.Specifically,youdon’twanttofindyourselfgreppingforstringsinlogstoinstrumentyourmetrics!Youwillnoticewhatthingschangeandwhatstaysthesame.

Forinstance,supposeyouwanttodirectlyoptimizeone-­dayactiveusers.However,duringyourearlymanipulationsofthesystem,youmaynoticethatdramaticalterationsoftheuserexperiencedon’tnoticeablychangethismetric.GooglePlusteammeasuresexpandsperread,resharesperread,plus­-onesperread,comments/read,commentsperuser,resharesperuser,etc.whichtheyuseincomputingthegoodnessofapostatservingtime.Also,notethatanexperimentframework,whereyoucangroupusersintobucketsandaggregatestatisticsbyexperiment,isimportant.SeeRule#12.

Bybeingmoreliberalaboutgatheringmetrics,youcangainabroaderpictureofyoursystem.Noticeaproblem?Addametrictotrackit!Excitedaboutsomequantitativechangeonthelastrelease?Addametrictotrackit!

Rule3-Choosemachinelearningovercomplexheuristic.

Asimpleheuristiccangetyourproductoutthedoor.Acomplexheuristicisunmaintainable.Onceyouhavedataandabasicideaofwhatyouaretryingtoaccomplish,moveontomachinelearning.Asinmostsoftwareengineeringtasks,youwillwanttobeconstantlyupdatingyourapproach,whetheritisaheuristicoramachine-learnedmodel,andyouwillfindthatthemachine-­learnedmodeliseasiertoupdateandmaintain(seeRule#16).

YourFirstPipeline

Focusonyoursysteminfrastructureforyourfirstpipeline.Whileitisfuntothinkaboutalltheimaginativemachinelearningyouaregoingtodo,itwillbehardtofigureoutwhatishappeningifyoudon’tfirsttrustyourpipeline.

Rule4-Keepthefirstmodelsimpleandgettheinfrastructureright.

Thefirstmodelprovidesthebiggestboosttoyourproduct,soitdoesn'tneedtobefancy.Butyouwillrunintomanymoreinfrastructureissuesthanyouexpect.Beforeanyonecanuseyourfancynewmachinelearningsystem,youhavetodetermine:

Howtogetexamplestoyourlearningalgorithm.Afirstcutastowhat“good”and“bad”meantoyoursystem.Howtointegrateyourmodelintoyourapplication.Youcaneitherapplythemodellive,orpre­computethemodelonexamplesofflineandstoretheresultsinatable.Forexample,youmightwanttopre­classifywebpagesandstoretheresultsinatable,butyoumightwanttoclassifychatmessageslive.

Choosingsimplefeaturesmakesiteasiertoensurethat:

Thefeaturesreachyourlearningalgorithmcorrectly.Themodellearnsreasonableweights.Thefeaturesreachyourmodelintheservercorrectly.

Onceyouhaveasystemthatdoesthesethreethingsreliably,youhavedonemostofthework.Yoursimplemodelprovidesyouwithbaselinemetricsandabaselinebehaviorthatyoucanusetotestmorecomplexmodels.Someteamsaimfora“neutral”firstlaunch:afirstlaunchthatexplicitlyde-­prioritizesmachinelearninggains,toavoidgettingdistracted.

Rule5-Testtheinfrastructureindependentlyfromthemachinelearning.

Makesurethattheinfrastructureistestable,andthatthelearningpartsofthesystemareencapsulatedsothatyoucantesteverythingaroundit.Specifically:

Testgettingdataintothealgorithm.Checkthatfeaturecolumnsthatshouldbepopulatedarepopulated.Whereprivacypermits,manuallyinspecttheinputtoyourtrainingalgorithm.Ifpossible,checkstatisticsinyourpipelineincomparisontoelsewhere,suchasRASTA.

Testgettingmodelsoutofthetrainingalgorithm.Makesurethatthemodelinyourtrainingenvironmentgivesthesamescoreasthemodelinyourservingenvironment(seeRule#37).Machinelearninghasanelementofunpredictability,somakesurethatyouhavetestsforthecodeforcreatingexamplesintrainingandserving,andthatyoucanloadanduseafixedmodelduringserving.Also,itisimportanttounderstandyourdata:seePracticalAdviceforAnalysisofLarge,ComplexDataSets.

Rule6-Becarefulaboutdroppeddatawhencopyingpipelines.

Oftenwecreateapipelinebycopyinganexistingpipeline(i.e.cargocultprogramming),andtheoldpipelinedropsdatathatweneedforthenewpipeline.Forexample,thepipelineforGooglePlusWhat’sHotdropsolderposts(becauseitistryingtorankfreshposts).ThispipelinewascopiedtouseforGooglePlusStream,whereolderpostsarestillmeaningful,butthepipelinewasstilldroppingoldposts.Anothercommonpatternistoonlylogdatathatwasseenbytheuser.Thus,thisdataisuselessifwewanttomodelwhyaparticularpostwasnotseenbytheuser,becauseallthenegativeexampleshavebeendropped.AsimilarissueoccurredinPlay.WhileworkingonPlayAppsHome,anewpipelinewascreatedthatalsocontainedexamplesfromtwootherlandingpages(PlayGamesHomeandPlayHomeHome)withoutanyfeaturetodisambiguatewhereeachexamplecamefrom.

Rule7-Turnheuristicsintofeatures,orhandlethemexternally.

Usuallytheproblemsthatmachinelearningistryingtosolvearenotcompletelynew.Thereisanexistingsystemforranking,orclassifying,orwhateverproblemyouaretryingtosolve.Thismeansthatthereareabunchofrulesandheuristics.Thesesameheuristicscangiveyoualiftwhentweakedwithmachinelearning.Yourheuristicsshouldbeminedforwhateverinformationtheyhave,fortworeasons.First,thetransitiontoamachinelearnedsystemwillbesmoother.Second,usuallythoserulescontainalotoftheintuitionaboutthesystemyoudon’twanttothrowaway.Therearefourwaysyoucanuseanexistingheuristic:

Preprocessusingtheheuristic.Ifthefeatureisincrediblyawesome,thenthisisanoption.Forexample,if,inaspamfilter,thesenderhasalreadybeenblacklisted,don’ttrytorelearnwhat“blacklisted”means.Blockthemessage.Thisapproachmakesthemostsenseinbinaryclassificationtasks.Createafeature.Directlycreatingafeaturefromtheheuristicisgreat.Forexample,ifyouuseaheuristictocomputearelevancescoreforaqueryresult,youcanincludethescoreasthevalueofafeature.Lateronyoumaywanttousemachinelearningtechniquestomassagethevalue(forexample,convertingthevalueintooneofafinitesetofdiscretevalues,orcombiningitwithotherfeatures)butstartbyusingtherawvalueproducedbytheheuristic.Minetherawinputsoftheheuristic.Ifthereisaheuristicforappsthatcombinesthenumberofinstalls,thenumberofcharactersinthetext,andthedayoftheweek,thenconsiderpullingthesepiecesapart,andfeedingtheseinputsintothelearningseparately.Sometechniquesthatapplytoensemblesapplyhere(seeRule#40).Modifythelabel.Thisisanoptionwhenyoufeelthattheheuristiccapturesinformationnotcurrentlycontainedinthelabel.Forexample,ifyouaretryingtomaximizethenumberofdownloads,butyoualsowantqualitycontent,thenmaybethesolutionistomultiplythelabelbytheaveragenumberofstarstheappreceived.Thereisalotofspacehereforleeway.Seethesectionon“YourFirstObjective”.DobemindfuloftheaddedcomplexitywhenusingheuristicsinanMLsystem.Usingoldheuristicsinyournewmachinelearningalgorithmcanhelptocreateasmoothtransition,butthinkaboutwhetherthereisasimplerwaytoaccomplishthesameeffect.Monitoring

Ingeneral,practicegoodalertinghygiene,suchasmakingalertsactionableandhavingadashboardpage.

Rule8-Knowthefreshnessrequirementsofyoursystem

Howmuchdoesperformancedegradeifyouhaveamodelthatisadayold?Aweekold?Aquarterold?Thisinformationcanhelpyoutounderstandtheprioritiesofyourmonitoring.Ifyoulose10%ofyourrevenueifthemodelisnotupdatedforaday,itmakessensetohaveanengineerwatchingitcontinuously.Mostadservingsystemshavenewadvertisementstohandleeveryday,andmustupdatedaily.Forinstance,iftheMLmodelforGooglePlaySearchisnotupdated,itcanhaveanimpactonrevenueinunderamonth.SomemodelsforWhat’sHotinGooglePlushavenopostidentifierintheirmodelsotheycanexportthesemodelsinfrequently.Othermodelsthathavepostidentifiersareupdatedmuchmorefrequently.Alsonoticethatfreshnesscanchangeovertime,especiallywhenfeaturecolumnsareaddedorremovedfromyourmodel.

Rule9-Detectproblemsbeforeexportingmodels.

Manymachinelearningsystemshaveastagewhereyouexportthemodeltoserving.Ifthereisanissuewithanexportedmodel,itisauser­facingissue.Ifthereisanissuebefore,thenitisatrainingissue,anduserswillnotnotice.Dosanitychecksrightbeforeyouexportthemodel.Specifically,makesurethatthemodel’sperformanceisreasonableonheldoutdata.Or,ifyouhavelingeringconcernswiththedata,don’texportamodel.ManyteamscontinuouslydeployingmodelschecktheareaundertheROCcurve(orAUC)beforeexporting.Issuesaboutmodelsthathaven’tbeenexportedrequireane­mailalert,butissuesonauser­facingmodelmayrequireapage.Sobettertowaitandbesurebeforeimpactingusers.

Rule10-Watchforsilentfailures.

Thisisaproblemthatoccursmoreformachinelearningsystemsthanforotherkindsofsystems.Supposethataparticulartablethatisbeingjoinedisnolongerbeingupdated.Themachinelearningsystemwilladjust,andbehaviorwillcontinuetobereasonablygood,decayinggradually.Sometimestablesarefoundthatweremonthsoutofdate,andasimplerefreshimprovedperformancemorethananyotherlaunchthatquarter!Forexample,thecoverageofafeaturemaychangeduetoimplementationchanges:forexampleafeaturecolumncouldbepopulatedin90%oftheexamples,andsuddenlydropto60%oftheexamples.Playoncehadatablethatwasstalefor6months,andrefreshingthetablealonegaveaboostof2%ininstallrate.Ifyoutrackstatisticsofthedata,aswellasmanuallyinspectthedataonoccasion,youcanreducethesekindsoffailures.*

AFrameworkforAnalysisofDataFreshness-Bouzeghoub&PeraltaRule11-Givefeaturecolumnsownersanddocumentation.

Ifthesystemislarge,andtherearemanyfeaturecolumns,knowwhocreatedorismaintainingeachfeaturecolumn.Ifyoufindthatthepersonwhounderstandsafeaturecolumnisleaving,makesurethatsomeonehastheinformation.Althoughmanyfeaturecolumnshavedescriptivenames,it'sgoodtohaveamoredetaileddescriptionofwhatthefeatureis,whereitcamefrom,andhowitisexpectedtohelp.

YourFirstObjective

Youhavemanymetrics,ormeasurementsaboutthesystemthatyoucareabout,butyourmachinelearningalgorithmwilloftenrequireasingleobjective,anumberthatyouralgorithmis“trying”tooptimize.Idistinguishherebetweenobjectivesandmetrics:ametricisanynumberthatyoursystemreports,whichmayormaynotbeimportant.SeealsoRule#2.

Rule12-Don'toverthinkwhichobjectiveyouchoosetodirectlyoptimize.

Youwanttomakemoney,makeyourusershappy,andmaketheworldabetterplace.Therearetonsofmetricsthatyoucareabout,andyoushouldmeasurethemall(seeRule#2).However,earlyinthemachinelearningprocess,youwillnoticethemallgoingup,eventhosethatyoudonotdirectlyoptimize.Forinstance,supposeyoucareaboutnumberofclicks,timespentonthesite,anddailyactiveusers.Ifyouoptimizefornumberofclicks,youarelikelytoseethetimespentincrease.So,keepitsimpleanddon’tthinktoohardaboutbalancingdifferentmetricswhenyoucanstilleasilyincreaseallthemetrics.Don’ttakethisruletoofarthough:donotconfuseyourobjectivewiththeultimatehealthofthesystem(seeRule#39).And,ifyoufindyourselfincreasingthedirectlyoptimizedmetric,butdecidingnottolaunch,someobjectiverevisionmayberequired.

Rule13-Chooseasimple,observableandattributablemetricforyourfirstobjective.

Oftenyoudon'tknowwhatthetrueobjectiveis.Youthinkyoudobutthenyouasyoustareatthedataandside­-by-sideanalysisofyouroldsystemandnewMLsystem,yourealizeyouwanttotweakit.Further,differentteammembersoftencan'tagreeonthetrueobjective.TheMLobjectiveshouldbesomethingthatiseasytomeasureandisaproxyforthe“true”objective.SotrainonthesimpleMLobjective,andconsiderhavinga"policylayer"ontopthatallowsyoutoaddadditionallogic(hopefullyverysimplelogic)todothefinalranking.

Theeasiestthingtomodelisauserbehaviorthatisdirectlyobservedandattributabletoanactionofthesystem:

Wasthisrankedlinkclicked?Wasthisrankedobjectdownloaded?Wasthisrankedobjectforwarded/repliedto/e­mailed?Wasthisrankedobjectrated?Wasthisshownobjectmarkedasspam/pornography/offensive?

Avoidmodelingindirecteffectsatfirst:

Didtheuservisitthenextday?Howlongdidtheuservisitthesite?Whatwerethedailyactiveusers?Indirecteffectsmakegreatmetrics,andcanbeusedduringA/Btestingandduringlaunchdecisions.

Finally,don’ttrytogetthemachinelearningtofigureout:

Istheuserhappyusingtheproduct?Istheusersatisfiedwiththeexperience?Istheproductimprovingtheuser’soverallwell­being?Howwillthisaffectthecompany’soverallhealth?

Theseareallimportant,butalsoincrediblyhard.Instead,useproxies:iftheuserishappy,theywillstayonthesitelonger.Iftheuserissatisfied,theywillvisitagaintomorrow.Insofaraswell­beingandcompanyhealthisconcerned,humanjudgementisrequiredtoconnectanymachinelearnedobjectivetothenatureoftheproductyouaresellingandyourbusinessplan,sowedon’tenduphere.

Rule14-Startingwithaninterpretablemodelmakesdebuggingeasier.

Linearregression,logisticregression,andPoissonregressionaredirectlymotivatedbyaprobabilisticmodel.Eachpredictionisinterpretableasaprobabilityoranexpectedvalue.Thismakesthemeasiertodebugthanmodelsthatuseobjectives(zero­oneloss,varioushingelosses,etcetera)thattrytodirectlyoptimizeclassificationaccuracyorrankingperformance.Forexample,ifprobabilitiesintrainingdeviatefromprobabilitiespredictedinside­-by-­sidesorbyinspectingtheproductionsystem,thisdeviationcouldrevealaproblem.

Forexample,inlinear,logistic,orPoissonregression,therearesubsetsofthedatawheretheaveragepredictedexpectationequalstheaveragelabel(1­momentcalibrated,orjustcalibrated)3.Ifyouhaveafeaturewhichiseither1or0foreachexample,thenthesetofexampleswherethatfeatureis1iscalibrated.Also,ifyouhaveafeaturethatis1foreveryexample,thenthesetofallexamplesiscalibrated.

Withsimplemodels,itiseasiertodealwithfeedbackloops(seeRule#36&).Often,weusetheseprobabilisticpredictionstomakeadecision:e.g.rankpostsindecreasingexpectedvalue(i.e.probabilityofclick/download/etc.).However,rememberwhenitcomestimetochoosewhichmodeltouse,thedecisionmattersmorethanthelikelihoodofthedatagiventhemodel(seeRule#27).

Rule15-SeparateSpamFilteringandQualityRankinginaPolicyLayer.

Qualityrankingisafineart,butspamfilteringisawar.*Thesignalsthatyouusetodeterminehighqualitypostswillbecomeobvioustothosewhouseyoursystem,andtheywilltweaktheirpoststohavetheseproperties.Thus,yourqualityrankingshouldfocusonrankingcontentthatispostedingoodfaith.Youshouldnotdiscountthequalityrankinglearnerforrankingspamhighly.Similarly,“racy”contentshouldbehandledseparatelyfromQualityRanking.Spamfilteringisadifferentstory.Youhavetoexpectthatthefeaturesthatyouneedtogeneratewillbeconstantlychanging.Often,therewillbeobviousrulesthatyouputintothesystem(ifaposthasmorethanthreespamvotes,don’tretrieveit,etcetera).Anylearnedmodelwillhavetobeupdateddaily,ifnotfaster.Thereputationofthecreatorofthecontentwillplayagreatrole.

Atsomelevel,theoutputofthesetwosystemswillhavetobeintegrated.Keepinmind,filteringspaminsearchresultsshouldprobablybemoreaggressivethanfilteringspaminemailmessages.Also,itisastandardpracticetoremovespamfromthetrainingdataforthequalityclassifier.

GoogleResearchBlog-LessonslearnedwhileprotectingGmail

Featureengineering

Inthefirstphaseofthelifecycleofamachinelearningsystem,theimportantissueistogetthetrainingdataintothelearningsystem,getanymetricsofinterestinstrumented,andcreateaservinginfrastructure.Afteryouhaveaworkingendtoendsystemwithunitandsystemtestsinstrumented,PhaseIIbegins.

Rule16-Plantolaunchanditerate.

Don’texpectthatthemodelyouareworkingonnowwillbethelastonethatyouwilllaunch,oreventhatyouwilleverstoplaunchingmodels.Thusconsiderwhetherthecomplexityyouareaddingwiththislaunchwillslowdownfuturelaunches.Manyteamshavelaunchedamodelperquarterormoreforyears.Therearethreebasicreasonstolaunchnewmodels:

youarecomingupwithnewfeatures,youaretuningregularizationandcombiningoldfeaturesinnewways,and/oryouaretuningtheobjective.

Regardless,givingamodelabitoflovecanbegood:lookingoverthedatafeedingintotheexamplecanhelpfindnewsignalsaswellasold,brokenones.So,asyoubuildyourmodel,thinkabouthoweasyitistoaddorremoveorrecombinefeatures.Thinkabouthoweasyitistocreateafreshcopyofthepipelineandverifyitscorrectness.Thinkaboutwhetheritispossibletohavetwoorthreecopiesrunninginparallel.Finally,don’tworryaboutwhetherfeature16of35makesitintothisversionofthepipeline.You’llgetitnextquarter.

Rule17-Startwithdirectlyobservedandreportedfeaturesasopposedtolearnedfeatures.

Thismightbeacontroversialpoint,butitavoidsalotofpitfalls.Firstofall,let’sdescribewhatalearnedfeatureis.Alearnedfeatureisafeaturegeneratedeitherbyanexternalsystem(suchasanunsupervisedclusteringsystem)orbythelearneritself(e.g.viaafactoredmodelordeeplearning).Bothofthesecanbeuseful,buttheycanhavealotofissues,sotheyshouldnotbeinthefirstmodel.Ifyouuseanexternalsystemtocreateafeature,rememberthatthesystemhasitsownobjective.Theexternalsystem'sobjectivemaybeonlyweaklycorrelatedwithyourcurrentobjective.Ifyougrabasnapshotoftheexternalsystem,thenitcanbecomeoutofdate.Ifyouupdatethefeaturesfromtheexternalsystem,thenthemeaningsmaychange.Ifyouuseanexternalsystemtoprovideafeature,beawarethattheyrequireagreatdealofcare.Theprimaryissuewithfactoredmodelsanddeepmodelsisthattheyarenon­-convex.Thus,thereisnoguaranteethatanoptimalsolutioncanbeapproximatedorfound,andthelocalminimafoundoneachiterationcanbedifferent.Thisvariationmakesithardtojudgewhethertheimpactofachangetoyoursystemismeaningfulorrandom.Bycreatingamodelwithoutdeepfeatures,youcangetanexcellentbaselineperformance.Afterthisbaselineisachieved,youcantrymoreesotericapproaches.

Rule18-Explorewithfeaturesofcontentthatgeneralizeacrosscontexts.

Oftenamachinelearningsystemisasmallpartofamuchbiggerpicture.Forexample,ifyouimagineapostthatmightbeusedinWhat’sHot,manypeoplewillplus­-one,re-­share,orcommentonapostbeforeitisevershowninWhat’sHot.Ifyouprovidethosestatisticstothelearner,itcanpromotenewpoststhatithasnodataforinthecontextitisoptimizing.YouTubeWatchNextcouldusenumberofwatches,orco­-watches(countsofhowmanytimesonevideowaswatchedafteranotherwaswatched)fromYouTubesearch.Youcanalsouseexplicituserratings.Finally,ifyouhaveauseractionthatyouareusingasalabel,seeingthatactiononthedocumentinadifferentcontextcanbeagreatfeature.Allofthesefeaturesallowyoutobringnewcontentintothecontext.Notethatthisisnotaboutpersonalization:figureoutifsomeonelikesthecontentinthiscontextfirst,thenfigureoutwholikesitmoreorless.

Rule19-Useveryspecificfeatureswhenyoucan.

Withtonsofdata,itissimplertolearnmillionsofsimplefeaturesthanafewcomplexfeatures.Identifiersofdocumentsbeingretrievedandcanonicalizedqueriesdonotprovidemuchgeneralization,butalignyourrankingwithyourlabelsonheadqueries..Thus,don’tbeafraidofgroupsoffeatureswhereeachfeatureappliestoaverysmallfractionofyourdata,butoverallcoverageisabove90%.Youcanuseregularizationtoeliminatethefeaturesthatapplytotoofewexamples.

Rule20-Combineandmodifyexistingfeaturestocreatenewfeaturesinhuman-understandableways.

Thereareavarietyofwaystocombineandmodifyfeatures.MachinelearningsystemssuchasTensorFlowallowyoutopre­processyourdatathroughtransformations.Thetwomoststandardapproachesare“discretizations”and“crosses”.

Discretizationconsistsoftakingacontinuousfeatureandcreatingmanydiscretefeaturesfromit.Consideracontinuousfeaturesuchasage.Youcancreateafeaturewhichis1whenageislessthan18,anotherfeaturewhichis1whenageisbetween18and35,etcetera.Don’toverthinktheboundariesofthesehistograms:basicquantileswillgiveyoumostoftheimpact.Crossescombinetwoormorefeaturecolumns.Afeaturecolumn,inTensorFlow'sterminology,isasetofhomogenousfeatures,(e.g.{male,female},{US,Canada,Mexico},etcetera).Acrossisanewfeaturecolumnwithfeaturesin,forexample,{male,female}×{US,Canada,Mexico}.Thisnewfeaturecolumnwillcontainthefeature(male,Canada).IfyouareusingTensorFlowandyoutellTensorFlowtocreatethiscrossforyou,this(male,Canada)featurewillbepresentinexamplesrepresentingmaleCanadians.Notethatittakesmassiveamountsofdatatolearnmodelswithcrossesofthree,four,ormorebasefeaturecolumns.

Crossesthatproduceverylargefeaturecolumnsmayoverfit.Forinstance,imaginethatyouaredoingsomesortofsearch,andyouhaveafeaturecolumnwithwordsinthequery,andyouhaveafeaturecolumnwithwordsinthedocument.Youcancombinethesewithacross,butyouwillendupwithalotoffeatures(seeRule#21).Whenworkingwithtexttherearetwoalternatives.Themostdraconianisadotproduct.Adotproductinitssimplestformsimplycountsthenumberofcommonwordsbetweenthequeryandthedocument.Thisfeaturecanthenbediscretized.Anotherapproachisanintersection:thus,wewillhaveafeaturewhichispresentifandonlyiftheword“pony”isinthedocumentandthequery,andanotherfeaturewhichispresentifandonlyiftheword“the”isinthedocumentandthequery.

Rule21-Thenumberoffeatureweightsyoucanlearninalinearmodelisroughlyproportionaltotheamountofdatayouhave.

Therearefascinatingstatisticallearningtheoryresultsconcerningtheappropriatelevelofcomplexityforamodel,butthisruleisbasicallyallyouneedtoknow.Ihavehadconversationsinwhichpeopleweredoubtfulthatanythingcanbelearnedfromonethousandexamples,orthatyouwouldeverneedmorethan1millionexamples,becausetheygetstuckinacertainmethodoflearning.Thekeyistoscaleyourlearningtothesizeofyourdata:

Ifyouareworkingonasearchrankingsystem,andtherearemillionsofdifferentwordsinthedocumentsandthequeryandyouhave1000labeledexamples,thenyoushoulduseadotproductbetweendocumentandqueryfeatures,TF-IDF,andahalf-­dozenotherhighlyhuman-­engineeredfeatures.1000examples,adozenfeatures.Ifyouhaveamillionexamples,thenintersectthedocumentandqueryfeaturecolumns,usingregularizationandpossiblyfeatureselection.Thiswillgiveyoumillionsoffeatures,butwithregularizationyouwillhavefewer.Tenmillionexamples,maybeahundredthousandfeatures.Ifyouhavebillionsorhundredsofbillionsofexamples,youcancrossthefeaturecolumnswithdocumentandquerytokens,usingfeatureselectionandregularization.Youwillhaveabillionexamples,and10millionfeatures.

Statisticallearningtheoryrarelygivestightbounds,butgivesgreatguidanceforastartingpoint.Intheend,useRule#28todecidewhatfeaturestouse.

Rule22-Cleanupfeaturesyouarenolongerusing.

Unusedfeaturescreatetechnicaldebt.Ifyoufindthatyouarenotusingafeature,andthatcombiningitwithotherfeaturesisnotworking,thendropitoutofyourinfrastructure.Youwanttokeepyourinfrastructurecleansothatthemostpromisingfeaturescanbetriedasfastaspossible.Ifnecessary,someonecanalwaysaddbackyourfeature.Keepcoverageinmindwhenconsideringwhatfeaturestoaddorkeep.Howmanyexamplesarecoveredbythefeature?Forexample,ifyouhavesomepersonalizationfeatures,butonly8%ofyourusershaveanypersonalizationfeatures,itisnotgoingtobeveryeffective.Atthesametime,somefeaturesmaypunchabovetheirweight.Forexample,ifyouhaveafeaturewhichcoversonly1%ofthedata,but90%oftheexamplesthathavethefeaturearepositive,thenitwillbeagreatfeaturetoadd.

HumanAnalysisoftheSystem

Beforegoingontothethirdphaseofmachinelearning,itisimportanttofocusonsomethingthatisnottaughtinanymachinelearningclass:howtolookatanexistingmodel,andimproveit.Thisismoreofanartthanascience,andyetthereareseveralanti-­patternsthatithelpstoavoid.

Rule23-Youarenotatypicalenduser.*

Thisisperhapstheeasiestwayforateamtogetboggeddown.Whiletherearealotofbenefitstofish-fooding(usingaprototypewithinyourteam)anddog-fooding(usingaprototypewithinyourcompany),employeesshouldlookatwhethertheperformanceiscorrect.Whileachangewhichisobviouslybadshouldnotbeused,anythingthatlooksreasonablynearproductionshouldbetestedfurther,eitherbypayinglaypeopletoanswerquestionsonacrowdsourcingplatform,orthroughaliveexperimentonrealusers.Therearetworeasonsforthis.Thefirstisthatyouaretooclosetothecode.Youmaybelookingforaparticularaspectoftheposts,oryouaresimplytooemotionallyinvolved(e.g.confirmationbias).Thesecondisthatyourtimeistoovaluable.Considerthecostof9engineerssittinginaonehourmeeting,andthinkofhowmanycontractedhumanlabelsthatbuysonacrowdsourcingplatform.

Ifyoureallywanttohaveuserfeedback,useuserexperiencemethodologies.Createuserpersonas(onedescriptionisinBillBuxton’sDesigningSketchingUserExperiences)earlyinaprocessanddousabilitytesting(onedescriptionisinSteveKrug’sDon’tMakeMeThink)later.Userpersonasinvolvecreatingahypotheticaluser.Forinstance,ifyourteamisallmale,itmighthelptodesigna35­-yearoldfemaleuserpersona(completewithuserfeatures),andlookattheresultsitgeneratesratherthan10resultsfor25­-40yearoldmales.Bringinginactualpeopletowatchtheirreactiontoyoursite(locallyorremotely)inusabilitytestingcanalsogetyouafreshperspective.

GoogleResearchBlog-Howtomeasuretranslationqualityinyouruserinterfaces

Rule24-Measurethedeltabetweenmodels

Oneoftheeasiest,andsometimesmostusefulmeasurementsyoucanmakebeforeanyusershavelookedatyournewmodelistocalculatejusthowdifferentthenewresultsarefromproduction.Forinstance,ifyouhavearankingproblem,runbothmodelsonasampleofqueriesthroughtheentiresystem,andlookatthesizeofthesymmetricdifferenceoftheresults(weightedbyrankingposition).Ifthedifferenceisverysmall,thenyoucantellwithoutrunninganexperimentthattherewillbelittlechange.Ifthedifferenceisverylarge,thenyouwanttomakesurethatthechangeisgood.Lookingoverquerieswherethesymmetricdifferenceishighcanhelpyoutounderstandqualitativelywhatthechangewaslike.Makesure,however,thatthesystemisstable.Makesurethatamodelwhencomparedwithitselfhasalow(ideallyzero)symmetricdifference.

Rule25-Whenchoosingmodels,utilitarianperformancetrumpspredictivepower.

Yourmodelmaytrytopredictclick­-through-­rate.However,intheend,thekeyquestioniswhatyoudowiththatprediction.Ifyouareusingittorankdocuments,thenthequalityofthefinalrankingmattersmorethanthepredictionitself.Ifyoupredicttheprobabilitythatadocumentisspamandthenhaveacutoffonwhatisblocked,thentheprecisionofwhatisallowedthroughmattersmore.Mostofthetime,thesetwothingsshouldbeinagreement:whentheydonotagree,itwilllikelybeonasmallgain.Thus,ifthereissomechangethatimprovesloglossbutdegradestheperformanceofthesystem,lookforanotherfeature.Whenthisstartshappeningmoreoften,itistimetorevisittheobjectiveofyourmodel.

Rule26-Lookforpatternsinthemeasurederrors,andcreatenewfeatures.

Supposethatyouseeatrainingexamplethatthemodelgot“wrong”.Inaclassificationtask,thiscouldbeafalsepositiveorafalsenegative.Inarankingtask,itcouldbeapairwhereapositivewasrankedlowerthananegative.Themostimportantpointisthatthisisanexamplethatthemachinelearningsystemknowsitgotwrongandwouldliketofixifgiventheopportunity.Ifyougivethemodelafeaturethatallowsittofixtheerror,themodelwilltrytouseit.Ontheotherhand,ifyoutrytocreateafeaturebaseduponexamplesthesystemdoesn’tseeasmistakes,thefeaturewillbeignored.Forinstance,supposethatinPlayAppsSearch,someonesearchesfor“freegames”.Supposeoneofthetopresultsisalessrelevantgagapp.Soyoucreateafeaturefor“gagapps”.However,ifyouaremaximizingnumberofinstalls,andpeopleinstallagagappwhentheysearchforfreegames,the“gagapps”featurewon’thavetheeffectyouwant.

Onceyouhaveexamplesthatthemodelgotwrong,lookfortrendsthatareoutsideyourcurrentfeatureset.Forinstance,ifthesystemseemstobedemotinglongerposts,thenaddpostlength.Don’tbetoospecificaboutthefeaturesyouadd.Ifyouaregoingtoaddpostlength,don’ttrytoguesswhatlongmeans,justaddadozenfeaturesandtheletmodelfigureoutwhattodowiththem(seeRule#21).Thatistheeasiestwaytogetwhatyouwant.

Rule27-Trytoquantifyobservedundesirablebehavior.

Somemembersofyourteamwillstarttobefrustratedwithpropertiesofthesystemtheydon’tlikewhicharen’tcapturedbytheexistinglossfunction.Atthispoint,theyshoulddowhateverittakestoturntheirgripesintosolidnumbers.Forexample,iftheythinkthattoomany“gagapps”arebeingshowninPlaySearch,theycouldhavehumanratersidentifygagapps.(Youcanfeasiblyusehuman-­labelleddatainthiscasebecausearelativelysmallfractionofthequeriesaccountforalargefractionofthetraffic.)Ifyourissuesaremeasurable,thenyoucanstartusingthemasfeatures,objectives,ormetrics.Thegeneralruleis“measurefirst,optimizesecond”.

Rule28-Beawarethatidenticalshort-termbehaviordoesnotimplyidenticallong-termbehavior.

Imaginethatyouhaveanewsystemthatlooksateverydoc_idandexact_query,andthencalculatestheprobabilityofclickforeverydocforeveryquery.YoufindthatitsbehaviorisnearlyidenticaltoyourcurrentsysteminbothsidebysidesandA/Btesting,sogivenitssimplicity,youlaunchit.However,younoticethatnonewappsarebeingshown.Why?Well,sinceyoursystemonlyshowsadocbasedonitsownhistorywiththatquery,thereisnowaytolearnthatanewdocshouldbeshown.

Theonlywaytounderstandhowsuchasystemwouldworklong­termistohaveittrainonlyondataacquiredwhenthemodelwaslive.Thisisverydifficult.

Training-ServingSkew

Training­-servingskewisadifferencebetweenperformanceduringtrainingandperformanceduringserving.Thisskewcanbecausedby:

adiscrepancybetweenhowyouhandledatainthetrainingandservingpipelines,orachangeinthedatabetweenwhenyoutrainandwhenyouserve,orafeedbackloopbetweenyourmodelandyouralgorithm.

WehaveobservedproductionmachinelearningsystemsatGooglewithtraining-­servingskewthatnegativelyimpactsperformance.Thebestsolutionistoexplicitlymonitoritsothatsystemanddatachangesdon’tintroduceskewunnoticed.

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论