NowIhaveapersonalwebsite!
DatascienceportfoliobyAndreyLukyanenkoThisportfolioisacompilationofnotebookswhichIcreatedfordataanalysisorforexplorationofmachinelearningalgorithms.Aseparatecategoryisforseparateprojects.
Stand-aloneprojects.HandwrittendigitrecognitionThisismyownprojectusingimagerecognitionmethodsinpractice.Thisisasite(alsoworksonmobile)whereusercandrawadigit,andmachinelearningmodels(FNNandCNN)willtrytorecognizeit.Afterthanmodelscanusethedrawndigitfortrainingtoimprovetheiraccuracy.Liveversionishere.Thecodecanbefoundhere.
ChatbotintelegramAconversationalchatbotintelegramwhichwascreatedforanhonorassignmentofnlpcoursebyHigherSchoolofEconomics.Themainfunctionalityofthebotistodistinguishtwotypesofquestions(questionsrelatedtoprogrammingandothers)andtheneithergiveananswerortalkusingaconversationalmodel.
Kagglecompetitions.AvitodemandpredictionAvitodemandpredictionwasacompetitiononKagglewherewetriedtopredictsomethinglikedemandbasedonadscontent.Thiscompetitionwasveryinterestingbecauseithadtabulardata,textsandimages.Ontheotherhandthiswasthereasonthecompetitionwasquitedifficult.Myteamreached131thplaceandgotbronzemedal!Hereisalinktomysolution.
CategorizationofpurchasesThiswasaRussianinclassKagglecompetitioninthethirdsessionofODSmlcourse.ItsoundedinterestingandItookpartinitreachinga3rdplace.Hereismykagglekernelwithasolution.
Kagglekernels.2018KaggleML&DSSurveyChallengeSometimeagoKagglelaunchedabigonlinesurveyforkagglersandnowthisdataispublic.Thereweremultiplechoicequestionsandsomeformsforopenanswers.Surveyreceived23k+respondentsfrom147countries.AsaresultwehaveabigdatasetwithrichinformationondatascientistsusingKaggle.InthiskernelIcompareDSinUSA,Russia,Indiaandothercountries.
DonorsChoose.orgApplicationScreeningDonorsChoose.orgempowerspublicschoolteachersfromacrossthecountrytorequestmuch-neededmaterialsandexperiencesfortheirstudents.DonorsChoose.orgreceiveshundredsofthousandsofprojectproposalseachyearforclassroomprojectsinneedoffunding.ThisisacompetitiononKagglewherepeoplecancreateamachinelearningmodeltohelpthisfundwithauto-approvingofapplications.Prizesaregiventotheauthorswiththemostupvotedkernels.HereismykernelwithextensiveEDA,featureengineeringandbuildingmodels.Thiskernelgot2ndplacebythenumberofvotesandIwonGooglePixelbookforit!
AvitoDemandPredictionChallengeAvitochallengeisaboutpredictingdemandforanonlineadvertisementbasedonitsfulldescription(title,description,images,etc.),itscontext(geographicallywhereitwasposted,similaradsalreadyposted)andhistoricaldemandforsimilaradsinsimilarcontexts.Thecompetitionisinterestingduetomanytypesofdatainitwhichallowstobuildvariousmodels.HereismykernelwithEDA,creatingfeaturesandbuildingmodels.
HomeCreditDefaultRiskHomeCreditBankoffersachallengeofcreditscoring.Thereisalotofdataaboutapplicantsandtheirpreviousbehavior.Hereismykernel.
MovieReviewSentimentAnalysisSometimeagoKagglehaslaunchedseveral"remakes"ofoldcompetitions.Itmeansthatdatasetsarethesame,butnowweareofferedanopportunitytosimplyexplorethedataandcreatekernelswithnewmethods.OneofthesecompetitionsissentimentanalysisofRottenTomatoesdatasetwith5classes(negative,somewhatnegative,neutral,somewhatpositive,positive).IhavecreatedakernelwithEDAandmodernNNarchitecture:LSTM-CNN.Currentlythiskernelshowsthe5thresultofleaderboard.
TwoSigma:UsingNewstoPredictStockMovementsInthiscompetitionReutersprovideuniquedata,whichcan'tbeobtainedoutsideofthiscompetition.Wecanseea10yearsworthofnewsandmarketdataonmanycompanies.Thiscompetitioniskernel-only,whichmeansthateveryonehasthesameamountofcomputationalpowerforthiscompetition.InmykernelIhaveanalysedthedataandshowedtrendsofmarketdata.
SantanderValuePredictionChallengeInthiscompetitionwegotananonymizeddataset,lateritwasfoundthatithadacertainstructure.InmykernelItriedtoanalyzethedataandcreatednewfeaturesusingNNmodel.
GoogleAnalyticsCustomerRevenuePredictionRStudiohostedthiscompetitiontoprovethatmachinelearningalgorithmscanimpactbusinessandhelpmarketing.InmykernelIdidanextensiveEDAandbuildaninterestingLGBmodel.
DataScienceforGood:CenterforPolicingEquityThisdatasetwasprovidedbyTheCenterforPolicingEquity.Theyhopethatkagglerswillhelptocreatebettermodels,findsomeuniqueinsightsandimprovegeo-analytics.InmykernelItrytodosuchthings.
Classificationproblems.Titanic:MachineLearningfromDisasterGithubnbviewer
Titanic:MachineLearningfromDisasterisaknowledgecompetitiononKaggle.Manypeoplestartedpracticinginmachinelearningwiththiscompetition,sodidI.Thisisabinaryclassificationproblem:basedoninformationaboutTitanicpassengerswepredictwhethertheysurvivedornot.GeneraldescriptionanddataareavailableonKaggle.Titanicdatasetprovidesinterestingopportunitiesforfeatureengineering.
Ghouls,Goblins,andGhosts...Boo!Githubnbviewer
Ghouls,Goblins,andGhosts...Boo!isaknowledgecompetitiononKaggle.Thisisamultipleclassificationproblem:basedoninformationaboutmonsterswepredicttheirtypes.AfuncompetitionforHalloween.GeneraldescriptionanddataareavailableonKaggle.Thisdatasethaslittlenumberofsamples,socarefulfeatureselectionandmodelensemblearenecessaryforhighaccuracy.
OttoGroupProductClassificationChallengeGithubnbviewer
OttoGroupProductClassificationChallengeisaknowledgecompetitiononKaggle.Thisisamultipleclassificationproblem.Basedoninformationaboutproductswepredicttheircategory.GeneraldescriptionanddataareavailableonKaggle.Thedataisobfuscated,sothemainquestionliesintheselectionofthemodelforprediction.
ImbalancedclassesGithubnbviewer
Inrealworlditiscommontomeetdatainwhichsomeclassesaremorecommonandothersarerarer.Incaseofaseriousdisbalancepredictionrareclassescouldbedifficultusingstandardclassificationmethods.InthisnotebookIanalysesuchasituation.Ican'tsharethedata,usedinthisanalysis.
BankcardactivationsGithubnbviewer
Banksstrivetoincreasetheefficiencyoftheircontactswithcustomers.Oneoftheareaswhichrequirethisisofferingnewproductstoexistingclients(cross-selling).Insteadofofferingnewproductstoallclients,itisagoodideatopredicttheprobabilityofapositiveresponse.Thentheofferscouldbesenttothoseclients,forwhomtheprobabilityofresponseishigherthansomethresholdvalue.InthisnotebookItrytosolvethisproblem.
Regressionproblems.HousePrices:AdvancedRegressionTechniquesGithubnbviewer
HousePrices:AdvancedRegressionTechniquesisaknowledgecompetitiononKaggle.Thisisaregressionproblem:basedoninformationabouthouseswepredicttheirprices.GeneraldescriptionanddataareavailableonKaggle.Thedatasethasalotoffeaturesandmanymissingvalues.Thisgivesinterestingpossibilitiesforfeaturetransformationanddatavisualization.
LoanPredictionGithubnbviewer
LoanPredictionisaknowledgeandlearninghackathononAnalyticsvidhya.DreamHousingFinancecompanydealsinhomeloans.Companywantstoautomatetheloaneligibilityprocess(realtime)basedoncustomerdetailprovidedwhilefillingonlineapplicationform.Basedoncustomer'sinformationwepredictwhethertheyshouldreceivealoanornot.GeneraldescriptionanddataareavailableonAnalyticsvidhya.
CaterpillarTubePricingGithubnbviewer
CaterpillarTubePricingisacompetitiononKaggle.Thisisaregressionproblem:basedoninformationabouttubeassemblieswepredicttheirprices.GeneraldescriptionanddataareavailableonKaggle.Datasetconsistsofmanyfiles,sothereisanadditionalchallengeincombiningthedatasndselectingthefeatures.
Naturallanguageprocessing.BagofWordsMeetsBagsofPopcornGithubnbviewer
BagofWordsMeetsBagsofPopcornisasentimentalanalysisproblem.Basedontextsofreviewswepredictwhethertheyarepositiveornegative.GeneraldescriptionanddataareavailableonKaggle.Thedataprovidedconsistsofrawreviewsandclass(1or2),sothemainpartiscleaningthetexts.
NLPwithPython:exploringFate/ZeroGithubnbviewer
Naturallanguageprocessinginmachinelearninghelpstoaccomplishavarietyoftasks,oneofwhichisextractinginformationfromtexts.ThisnotebookisanoverviewofseveraltextexplorationmethodsusingEnglishtranslationofJapaneselightnovel"Fate/Zero"asanexample.
NLP.TextgenerationwithMarkovchainsGithubnbviewer
ThisnotebookshowshowanewtextcanbegeneratedbasedonagivencorpususinganideaofMarkovchains.Istartwithsimplefirst-orderchainsandwitheachstepimprovemodeltogeneratebettertext.
NLP.TextsummarizationGithubnbviewer
Thisnotebookshowshowtextcanbesummarizedchoosingseveralmostimportantsentencesfromthetext.Iexplorevariousmethodsofdoingthisbasedonanewsarticle.
ClusteringClusteringwithKMeansGithubnbviewer
Clusteringisanapproachtounsupervisedmachinelearning.ClusteringwithKMeansisoneofalgorithmsofclustering.inthisnotebookI'lldemonstratehowitworks.Datausedisaboutvarioustypesofseedsandtheirparameters.Itisavailablehere.
NeuralnetworksFeedforwardneuralnetworkwithregularizationGithubnbviewer
Thisisasimpleexampleoffeedforwardneuralnetworkwithregularization.ItisbasedonAndrewNg'slecturesonCoursera.IuseddatafromKaggle'schallenge"Ghouls,Goblins,andGhosts...Boo!",itisavailablehere.
DataexplorationandanalysisTelematicdataGithubnbviewer
Ihaveadatasetwithtelematicinformationabout10carsdrivingduringoneday.Ivisualisedata,searchforinsightsandanalysethebehaviorofeachdriver.Ican'tsharethedata,buthereisthenotebook.Iwanttonoticethatfoliummapcan'tberenderedbynativegithub,butnbviewer.jupytercandoit.
Recommendationsystems.CollaborativefilteringGithubnbviewer
Recommendersaresystems,whichpredictratingsofusersforitems.ThereareseveralapproachestobuildsuchsystemsandoneofthemisCollaborativeFiltering.Thisnotebookshowsseveralexamplesofcollaborativefilteringalgorithms.
评论