Erlemar.github.io Data science portfolio

我要开发同款
匿名用户2021年11月11日
39阅读
开发技术Python
所属分类人工智能、机器学习/深度学习
授权协议Readme

作品详情

NowIhaveapersonalwebsite!

DatascienceportfoliobyAndreyLukyanenko

ThisportfolioisacompilationofnotebookswhichIcreatedfordataanalysisorforexplorationofmachinelearningalgorithms.Aseparatecategoryisforseparateprojects.

Stand-aloneprojects.Handwrittendigitrecognition

Thisismyownprojectusingimagerecognitionmethodsinpractice.Thisisasite(alsoworksonmobile)whereusercandrawadigit,andmachinelearningmodels(FNNandCNN)willtrytorecognizeit.Afterthanmodelscanusethedrawndigitfortrainingtoimprovetheiraccuracy.Liveversionishere.Thecodecanbefoundhere.

Chatbotintelegram

AconversationalchatbotintelegramwhichwascreatedforanhonorassignmentofnlpcoursebyHigherSchoolofEconomics.Themainfunctionalityofthebotistodistinguishtwotypesofquestions(questionsrelatedtoprogrammingandothers)andtheneithergiveananswerortalkusingaconversationalmodel.

Kagglecompetitions.Avitodemandprediction

AvitodemandpredictionwasacompetitiononKagglewherewetriedtopredictsomethinglikedemandbasedonadscontent.Thiscompetitionwasveryinterestingbecauseithadtabulardata,textsandimages.Ontheotherhandthiswasthereasonthecompetitionwasquitedifficult.Myteamreached131thplaceandgotbronzemedal!Hereisalinktomysolution.

Categorizationofpurchases

ThiswasaRussianinclassKagglecompetitioninthethirdsessionofODSmlcourse.ItsoundedinterestingandItookpartinitreachinga3rdplace.Hereismykagglekernelwithasolution.

Kagglekernels.2018KaggleML&DSSurveyChallenge

SometimeagoKagglelaunchedabigonlinesurveyforkagglersandnowthisdataispublic.Thereweremultiplechoicequestionsandsomeformsforopenanswers.Surveyreceived23k+respondentsfrom147countries.AsaresultwehaveabigdatasetwithrichinformationondatascientistsusingKaggle.InthiskernelIcompareDSinUSA,Russia,Indiaandothercountries.

DonorsChoose.orgApplicationScreening

DonorsChoose.orgempowerspublicschoolteachersfromacrossthecountrytorequestmuch-neededmaterialsandexperiencesfortheirstudents.DonorsChoose.orgreceiveshundredsofthousandsofprojectproposalseachyearforclassroomprojectsinneedoffunding.ThisisacompetitiononKagglewherepeoplecancreateamachinelearningmodeltohelpthisfundwithauto-approvingofapplications.Prizesaregiventotheauthorswiththemostupvotedkernels.HereismykernelwithextensiveEDA,featureengineeringandbuildingmodels.Thiskernelgot2ndplacebythenumberofvotesandIwonGooglePixelbookforit!

AvitoDemandPredictionChallenge

Avitochallengeisaboutpredictingdemandforanonlineadvertisementbasedonitsfulldescription(title,description,images,etc.),itscontext(geographicallywhereitwasposted,similaradsalreadyposted)andhistoricaldemandforsimilaradsinsimilarcontexts.Thecompetitionisinterestingduetomanytypesofdatainitwhichallowstobuildvariousmodels.HereismykernelwithEDA,creatingfeaturesandbuildingmodels.

HomeCreditDefaultRisk

HomeCreditBankoffersachallengeofcreditscoring.Thereisalotofdataaboutapplicantsandtheirpreviousbehavior.Hereismykernel.

MovieReviewSentimentAnalysis

SometimeagoKagglehaslaunchedseveral"remakes"ofoldcompetitions.Itmeansthatdatasetsarethesame,butnowweareofferedanopportunitytosimplyexplorethedataandcreatekernelswithnewmethods.OneofthesecompetitionsissentimentanalysisofRottenTomatoesdatasetwith5classes(negative,somewhatnegative,neutral,somewhatpositive,positive).IhavecreatedakernelwithEDAandmodernNNarchitecture:LSTM-CNN.Currentlythiskernelshowsthe5thresultofleaderboard.

TwoSigma:UsingNewstoPredictStockMovements

InthiscompetitionReutersprovideuniquedata,whichcan'tbeobtainedoutsideofthiscompetition.Wecanseea10yearsworthofnewsandmarketdataonmanycompanies.Thiscompetitioniskernel-only,whichmeansthateveryonehasthesameamountofcomputationalpowerforthiscompetition.InmykernelIhaveanalysedthedataandshowedtrendsofmarketdata.

SantanderValuePredictionChallenge

Inthiscompetitionwegotananonymizeddataset,lateritwasfoundthatithadacertainstructure.InmykernelItriedtoanalyzethedataandcreatednewfeaturesusingNNmodel.

GoogleAnalyticsCustomerRevenuePrediction

RStudiohostedthiscompetitiontoprovethatmachinelearningalgorithmscanimpactbusinessandhelpmarketing.InmykernelIdidanextensiveEDAandbuildaninterestingLGBmodel.

DataScienceforGood:CenterforPolicingEquity

ThisdatasetwasprovidedbyTheCenterforPolicingEquity.Theyhopethatkagglerswillhelptocreatebettermodels,findsomeuniqueinsightsandimprovegeo-analytics.InmykernelItrytodosuchthings.

Classificationproblems.Titanic:MachineLearningfromDisaster

Githubnbviewer

Titanic:MachineLearningfromDisasterisaknowledgecompetitiononKaggle.Manypeoplestartedpracticinginmachinelearningwiththiscompetition,sodidI.Thisisabinaryclassificationproblem:basedoninformationaboutTitanicpassengerswepredictwhethertheysurvivedornot.GeneraldescriptionanddataareavailableonKaggle.Titanicdatasetprovidesinterestingopportunitiesforfeatureengineering.

Ghouls,Goblins,andGhosts...Boo!

Githubnbviewer

Ghouls,Goblins,andGhosts...Boo!isaknowledgecompetitiononKaggle.Thisisamultipleclassificationproblem:basedoninformationaboutmonsterswepredicttheirtypes.AfuncompetitionforHalloween.GeneraldescriptionanddataareavailableonKaggle.Thisdatasethaslittlenumberofsamples,socarefulfeatureselectionandmodelensemblearenecessaryforhighaccuracy.

OttoGroupProductClassificationChallenge

Githubnbviewer

OttoGroupProductClassificationChallengeisaknowledgecompetitiononKaggle.Thisisamultipleclassificationproblem.Basedoninformationaboutproductswepredicttheircategory.GeneraldescriptionanddataareavailableonKaggle.Thedataisobfuscated,sothemainquestionliesintheselectionofthemodelforprediction.

Imbalancedclasses

Githubnbviewer

Inrealworlditiscommontomeetdatainwhichsomeclassesaremorecommonandothersarerarer.Incaseofaseriousdisbalancepredictionrareclassescouldbedifficultusingstandardclassificationmethods.InthisnotebookIanalysesuchasituation.Ican'tsharethedata,usedinthisanalysis.

Bankcardactivations

Githubnbviewer

Banksstrivetoincreasetheefficiencyoftheircontactswithcustomers.Oneoftheareaswhichrequirethisisofferingnewproductstoexistingclients(cross-selling).Insteadofofferingnewproductstoallclients,itisagoodideatopredicttheprobabilityofapositiveresponse.Thentheofferscouldbesenttothoseclients,forwhomtheprobabilityofresponseishigherthansomethresholdvalue.InthisnotebookItrytosolvethisproblem.

Regressionproblems.HousePrices:AdvancedRegressionTechniques

Githubnbviewer

HousePrices:AdvancedRegressionTechniquesisaknowledgecompetitiononKaggle.Thisisaregressionproblem:basedoninformationabouthouseswepredicttheirprices.GeneraldescriptionanddataareavailableonKaggle.Thedatasethasalotoffeaturesandmanymissingvalues.Thisgivesinterestingpossibilitiesforfeaturetransformationanddatavisualization.

LoanPrediction

Githubnbviewer

LoanPredictionisaknowledgeandlearninghackathononAnalyticsvidhya.DreamHousingFinancecompanydealsinhomeloans.Companywantstoautomatetheloaneligibilityprocess(realtime)basedoncustomerdetailprovidedwhilefillingonlineapplicationform.Basedoncustomer'sinformationwepredictwhethertheyshouldreceivealoanornot.GeneraldescriptionanddataareavailableonAnalyticsvidhya.

CaterpillarTubePricing

Githubnbviewer

CaterpillarTubePricingisacompetitiononKaggle.Thisisaregressionproblem:basedoninformationabouttubeassemblieswepredicttheirprices.GeneraldescriptionanddataareavailableonKaggle.Datasetconsistsofmanyfiles,sothereisanadditionalchallengeincombiningthedatasndselectingthefeatures.

Naturallanguageprocessing.BagofWordsMeetsBagsofPopcorn

Githubnbviewer

BagofWordsMeetsBagsofPopcornisasentimentalanalysisproblem.Basedontextsofreviewswepredictwhethertheyarepositiveornegative.GeneraldescriptionanddataareavailableonKaggle.Thedataprovidedconsistsofrawreviewsandclass(1or2),sothemainpartiscleaningthetexts.

NLPwithPython:exploringFate/Zero

Githubnbviewer

Naturallanguageprocessinginmachinelearninghelpstoaccomplishavarietyoftasks,oneofwhichisextractinginformationfromtexts.ThisnotebookisanoverviewofseveraltextexplorationmethodsusingEnglishtranslationofJapaneselightnovel"Fate/Zero"asanexample.

NLP.TextgenerationwithMarkovchains

Githubnbviewer

ThisnotebookshowshowanewtextcanbegeneratedbasedonagivencorpususinganideaofMarkovchains.Istartwithsimplefirst-orderchainsandwitheachstepimprovemodeltogeneratebettertext.

NLP.Textsummarization

Githubnbviewer

Thisnotebookshowshowtextcanbesummarizedchoosingseveralmostimportantsentencesfromthetext.Iexplorevariousmethodsofdoingthisbasedonanewsarticle.

ClusteringClusteringwithKMeans

Githubnbviewer

Clusteringisanapproachtounsupervisedmachinelearning.ClusteringwithKMeansisoneofalgorithmsofclustering.inthisnotebookI'lldemonstratehowitworks.Datausedisaboutvarioustypesofseedsandtheirparameters.Itisavailablehere.

NeuralnetworksFeedforwardneuralnetworkwithregularization

Githubnbviewer

Thisisasimpleexampleoffeedforwardneuralnetworkwithregularization.ItisbasedonAndrewNg'slecturesonCoursera.IuseddatafromKaggle'schallenge"Ghouls,Goblins,andGhosts...Boo!",itisavailablehere.

DataexplorationandanalysisTelematicdata

Githubnbviewer

Ihaveadatasetwithtelematicinformationabout10carsdrivingduringoneday.Ivisualisedata,searchforinsightsandanalysethebehaviorofeachdriver.Ican'tsharethedata,buthereisthenotebook.Iwanttonoticethatfoliummapcan'tberenderedbynativegithub,butnbviewer.jupytercandoit.

Recommendationsystems.Collaborativefiltering

Githubnbviewer

Recommendersaresystems,whichpredictratingsofusersforitems.ThereareseveralapproachestobuildsuchsystemsandoneofthemisCollaborativeFiltering.Thisnotebookshowsseveralexamplesofcollaborativefilteringalgorithms.

声明:本文仅代表作者观点,不代表本站立场。如果侵犯到您的合法权益,请联系我们删除侵权资源!如果遇到资源链接失效,请您通过评论或工单的方式通知管理员。未经允许,不得转载,本站所有资源文章禁止商业使用运营!
下载安装【程序员客栈】APP
实时对接需求、及时收发消息、丰富的开放项目需求、随时随地查看项目状态

评论