还剩9页未读,继续阅读
本资源只提供10页预览,全部文档请下载后查看!喜欢就下载吧,查找使用更方便
文本内容:
R语言房价回归预测案例报告首先,我们加载数据和必要的软件包loadames_train.RdatalibraryMASSlibrarydplyr#Warning:packagedpiyrwasbuiltunderRversion
3.
3.3Iibraryggplot2#Warning:packageggplot2wasbuiltunderRversion
3.
3.3librarydevtools#Warning:packagedevtoolswasbuiltunderRversion
3.
3.3librarystatsrlibraryliibridate#Warning:packagelubridalewasbuiltunderRversion
3.
3.3librarytidyr#Warning:packagetidyrwasbuiltunderRversion
3.
3.3librarygridExtraI#Warning:packagegridExtrawasbuiltunderRversion
3.
3.31在数据集中建立房屋的年龄标签直方图30个箱,并描述分布typeyourcodeforQuestion1hereandKnitggplotidataames_train.aesx=yeartoday-Year.Built+geom_histogrambins3fill-bluecolourwhite+labstitleHousecountsbyagexHouseageyHousecount+geom_vlinexinterceptmedianyearitoday-anies_trainSYear.Builtcolour=red+annotatetextx=medianyeartoday-ames_trainSYear.Built-4y=c1-1label=cMedianmediantyeantoday-ames_train$Year.Builtcolour=redangle=c9+geom_vlinexinterceptmeanyeartoday-ames_trainsYear.Bui11colour#4lc42f+annotatetextxmeanyeahtoday-aines_trainSYear.Built+4yc10010##Central.Air+Sale.Conditiondata=data####Residuals:##MinIQMedian3QMax##-
1.2941-
0.06298-O.XX
260.
06037.88123###Coefficients:##Intercept#area#Lot.Area#OveralLQual#Overall.Cond#Ycar.Built#Year.Remod.Add##BsmtFin.SF.l#BsmtFin.SF.2#Bsmt.Unf.SF#Bedroom.AbvGr#Fireplaces#Garagc.Cars#MS.ZoningFV#MS.Zoninglall#MS.ZoningRH#MS.ZoningRL#MS.ZoningRM#Condition.2Feedr#Condition.2Norm#Condition.2PosA#Condition.2PosN#Condition.2RRNn#BIdg.Type2fmCon#Bldg.TypeDuplex#Bldg.TypeTwnhs#Bldg.TypeTwnhsE#Exter.QualFa#Extcr.QualGdEstimateStd.ErrortvaluePr|t|
3.789e+
006.579e-0l
5.759!.14e-08***
2.846e-
041.603e-
0517.7522e-16***
1.612e-
064.660e-
073.
4600.000564*♦*
7.868e-
025.594e-
0314.0662e-l6***
5.l96e-025l20e-
0310.1492e-16***
2.203e-
032.788C-
047.
9027.48c-15***
1.061e-
033.111e-
043.4II
0.000675***
2.146c-041463c-
0514.6692c-16***
1.732c-
042.743C-
056.
3154.13e-10***
1.081C-
041.517c-O
57.
1222.09c-12***-
1.993C-
027.477c-03-
2.
6660.007813**
2.649e-
027.959e-
033.
3280.000907***
3.673c-
028.O13C-O
34.
5845.16c-06***
3.426e-
015.061e-
026.
7702.24e-l1***
3.180C-
011.433c-
012.
2190.026688*
2.814e-
016.743e-
024.
1723.29e-05***
3.138e-
014.696e-
026.
6833.95*11***
2.364e-
014.664e-
025.
0684.83e-07***-
9.684e-02i.091e-01-
0.
8870.375158-
1.817e-
039.465e-02-
0.
0190.
9846897.855e-
021.629e-0l
0.
4820.629736-
1.006e+
001.356e-01-
7.
4192.59e-13*♦*-
6.852e-02l.628e-0l-
0.
4210.
6738854.260e-
023.201e-
021.
3310.183533-
7.264e-
022.53Ie-02-
2.
8700.004199**-l.367e-0l
2.387e-02-
5.726l.38e-08***-
3.960e-02l.766e-02-
2.
2430.025136*-
5.329e-
35.419e-02-
0.
0980.921685-
8.948C-
022.558C-02-
3.
4980.000491♦**#Sale.ConditionAdjLand
1.264e-
019.730e-
02.
2990.194106#Sale.ConditionAllocal.84le-0l
6.837e-
022.
6930.007208**#Sale.ConditionFamily-
3.146e-
023.620e-02-
0.
8690.384956##Sale.ConditionNormal
8.407e-
021.777e-
024.
7312.56X6***##Sale.ConditionPartial
1.250e-
012.434e-
025.
1373.38e-07***##--#Signif.codes:0***
0.Xl***.1*
0.
0570.111I###Residualstandarderror:
0.1296on961degreesoffreedom#MultipleR-squared:
0.9086AdjustedR-squared:
0.905#F-statistic:
251.5on38and961DFp-valuc:
2.2c-16ffprintFinalmodelbasedonA/C^sununaryfinal_tnodel_AICConditionsforMultilinearregressionverificationparmfrow=c23^/ZCheckingnearlynonmiiithistfinal_model_BIC$residuals.breaks=100xlab=Residualsmain=Residualplot-Nearlynormalityplotdensityfinal_model_BIC$residualsmain=Densityplot-NearlyNormalityqqnormfinal_model_BICSresidualsqqlinefinal_model_BIC$residuals##Checkinglinearityandindependentresidualsp\olfinal_model_BIC$residualsylab=Residualsmain=ResiduaIplot-Linearity##ConsiamvarialtHnvconditioplotfinal_modcl_BIC$residualsfinal_niodcl_BIC$fiucd.valucs.xlab=FittedvaluesylabResidualsmainConstantvariability.plotabsfmal_model_BICSresiduals--final_model_BIC$fitted.valuesxlab-Fittedvaluesylab=Residualsmain=IndependencelabelcCMeanroundmeanyeartoday-ames_trainSYear.Built1coiour=#41c42fangle=c-900+themeplot.titleelement_texthjust-
0.5titleelenient_texthjust-
0.5colourredsizerel2axis.titleelement_textsizerelIcolourbluehjust.5legend.positionnonepanel.backgroundelement_rectfillgreycolourblackpanel.grid=clcmcnt_linecolor=blueHousecountsbyage050100150iHouseage.上面绘制的房屋年龄分布是非常正确的.我们看到三个峰值,表明分布是多模态的这个数据集中的大部分房子(约I40个)都是10-15岁第二类房屋(约80人)年龄在55-60岁之间,分布右边的第三类房屋(约37户)的年龄在90-95岁之间这可能表示指定期间房地产业务的繁荣.分配表明,超过45%的房屋建于不到45年前房地产的咒语是“位置,位置,位置!”制作一个图形显示,将家庭价格与爱荷华州艾姆斯的邻居相关联哪些总结统计数据最适合用于确定最昂贵,最便宜,最异质(房价差异最大)的社区?根据您选择的总结统计数据,报告哪些社区报告您所选择的这些社区的汇总统计信息的值##绘制代表价格分布的方块图ames_train%%ggplotaesx=Neighborhoody=pricc10A5fill=Ncighborhood^gcom_boxplot+hcmcaxis.tcxt.xelement_textangle90Neighborhood##计算由邻居分组并存储在数据框中的所有中央和传播统计数据ames_stats-ames_train%%group_byNeighborhood%%summariseMin=minprice.na.rmTRUEMean=meanpricena.rm=TRUEMedian=medianprice.na.rm=TRUEIQR=IQRprice.na.mi=TRUE.Max=maxpricena.rmTRUE.RangeMaxMin%%arrangedescMean#Warning:packagebindreppwasbuiltunderRversion
3.
3.3制存储在数据框中的汇总统计信息ames_summary-data.framefilteriames_stats.MeanmaxMeanSNeighborhood.filterames_stats.Mean==minMean$Neighborhoolfilterames_statsIQR==maxIQR$Neighborhood#格式化数据帧colnamesames_summaryi-cCMostExpensiveLeastExpensiveMostheterogenousrownamesames_sunimary-cNeighborhood#打印出数据帧ames_summary#MostExpensiveLeastExpensiveMostheterogenous#NeighborhoodStoneBrTheabovesummarystatisticscollectedbasedonabovescriptsshowsthatStoncBrNeighborhoodwiththehighestpricemeanmedianvaluesamongsallneighborhoods.StoneBristhereforethemostexpensiveneigborhood.HoweverwecanaswellseethatbasedontheIQRStoneBrhashemostdispersedhousepricemakingitaswellthemostheterogenousneighborhood.McadowVinoppositehasthelowestmeanmedianintermsofhousepriceandthereforeistheleastexpensiveneighborhood.3WhichvariablehasthelargestnumberofmissingvaluesExplainwhyitmakessensethattherearcsomanymissingvaluesforthisvariable.typecodefarQuestion3hereandKnit#Countallvariablesmissingi¥//Me.smissing-vaI-ames_train%%summarise_allfunssumis.na.#selectthecolumnthathasthemaximummissing\ahesmissmg_va\\.colSumsmissing_val11J==maxmissing_vaD]##Atibble:Ix1#Pool.QC#int#1997ThevariablethathashehighestnumberofmissingvalueisPool.QCwhichmeansmostofthehousesdonothaveaswimmingpooLOnly3housesinthisdatasethaveswimmingpools.Thismakessenseasthecostofowningaswimmingpoolisgenerallyhighasitentailsnotonlytheareaspacebutaswelltheconstructionandmoreimportantlyherunningcostswhichinessencearelifetime.Wcwanttopredictthenaturallogofthehomeprices.CandidateexplanatoryvariablesarclotsizeinsquarefeetLot.AreaslopeofpropertyLand.SlopeoriginalconstructiondateYear.BuiltremodeldateYear.Remod.AddandthenumberofbedroomsabovegradeBedroom.AbvGr.PickamodelselectionormodelaveragingmethodcoveredintheSpecializationanddescribehowthismethodworks.Thenusethismethodofindthebestmultipleregressionmodelforpredictingthenaturallogofthehomeprices.#selectthevariablewithatleastonemissingvaluesnamesimissing_val|.colSumsmissing_val=1]#
[1]LoLFrontageAlleyMas.Vnr.ArcaBsmt.Qual#
[5]Bsmt.CondBsmt-ExposurcBsmtFin.Type.lBsmtFin.SF.I#
[9]BsmtFin.Type.2BsmtFin.SF2Bsmt.Unf.SFToial.Bsmt.SF#113]Bsmt.Full.BathBsmt.Half.BathFireplace.QuGarage.Type#117]Garage.Yr.BitGarage.FinishGarage.CarsGanige.Area#
[21]Garage.Qual0Garage.CondPool.QCFence#
[25]Misc.FeatureAbovearelistedallvariableswithinthisdatasetwithNAvalues.Basedonthedatadictionaryandonthepreviousinvestigationswecanexplainthemissinganddecideonhowtodealwiththem.■Lot.FrontageandMas.Vnr.AreaarcintegersandtheirNAsmeansthatthereisnoLotfrontageorMasonryVeneerarea.WecansettheseNAvaluetozero0•AllvariablesAlleyPool.QCFenceMisc.FeatureNAsmeansthatthereisnoneofthem.WecansetallNAstothevalue“None”.21houseshavenobasementthereforeAllvariablescontainingt4BsmfNAswillbesettothevalue“None”.47houseshavenogaragehence“Garage“NAscanbesettothevalueNone.Basedontheaboveanalysiswehavecreatedafunctionprepare_dataprepare_datathatwilltreatallNAvaluesasdescribedaboveandmaketheupdateddatasetdatadataavailableforourmodelselection.intheappropriatewindowbelowwearedescribingthemodelselectionmethodused.typeyourcodeforQuestion4hereandKnit/fFrequentistapproachofmodelselection##BackwardselectionstartingwithfullmodelPreparingthedatafunction#thisfunctionwillpreparethedatafromanicsforu7/«/y.y/.sprepare_daa-functiondata=ames_train{identifyingvariableswithsamevaluesnottobeconsideredinmodelingsincenotsignificanttothestudyvar_lo_use-data%%summarise_allfunslenglhunique!.var_to_use-namesdata.framevar_to_use|var_to_use11selectuserfidvariablesonlyforthemodeldata-data%%selectvar_to_useDealwithNAvalues#SeperaieNumericalfromCategoricalvariablesames_var_types-splitnamesdatasapplydata.functionxpastelclassxcollapse#DealwithnumericalvariablesNAvaluesames_train_int-data%%selectames_var_typesSintegerames_train_int|is.naames_train_int]-0#DealingwithcategoricalvariablesNAvaluesames_train_fac-data%%selectames_var_typesSfactorames_train_fac-sapplyames_train_facas.characterames_train_fac|is.naames_train_fac]-cNoneames_train_fac[ames_train_facz=|-cNoneanies_train_fac-data.frameames_train_fac#Mergingbothnumericalandcaiegoricalsdata-cbindames_train_int_ames_train_facreturnna.omitdataModelselectionfunction#Thisfunctionwilldotheautomaticmodelselectionandyieldthebestmodelbasedonselection7淞万amodel_selection-functiontdatadataresponse_variable=pricecriteria-cpvalueAdj-RsquareBICAIC.significance=
0.05{count-0Buildingthelinearmodelbasedonthedataframe%d沁〃ed################################################################LoopingalongthevariablenamesandcreatingthenextvariablelistfortheImfull_var_list-namesdata[names!data!response_variable|whilecount=={next_var-cfbmiula」m-NULLforjinseq_alongfull_var_list{ncxt_var-noquotctpastcOincxt_varfull_var_list|j]scp=+#Removingthe+attheendofthevariablelistnext_var_iist-substrnext_var.Lnchannext_var-lWritingthelinearmodelformulatobeusedwiththenewvariablelistformula_lm-as.formulapastelpastelogresponse_variable.〜sep=next_var_list.sep=data_lm-imiformula=formula」m*data=data##############################################################Startingthemodelselectionbasedonthemethodselected#TTTTTT7777tTITT7rfITTT77rTTT77T7TTTTT7frtTTTT7TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTrrTTTTTr17tTTTTrT7TTTT7/T7TTTT77TTfTITirTriftolowerfcriteria%in%pvalue{林Selectp-valueshigherthanthepresetsignificancelevel.Defaultis
0.05model_pvalue-daa.framesummarydataJmScoefIsummarydaa_lm$coef4]significance]#Fillerallrowinthereturneddataframewhichpartiallytnalchvariablenamesinoriginaldaiaframencxt_lm_var-sapplynamcsdata.functionxgrcpxrow.namcsimodcl_pvalue.va!uc=TRUE#Removeallvariableswithnomatch林herewedealwithcharacterOtypeofvariablewhichisdifferentfromNULL.#Wefilterifbymatchingthelength==0Lnexl_lm_var-next_lm_var;sapplynext_lm_var.functionxlengthx==OL|#Updatingthenextlistofvariabletobeusedforlinearmodelingnexl_var_list-namesnext_lm_var|namesnext_lm_var!=response_variable]#Checkingifpreviousandcurrentvariablelistarethesameifilength!next_var_list!=lengthfull_var_list{full_var_list-next_var_list}else{count-I}Jelseiftolowercriteria%in%adj-rsquarcd{#Selectp-valueshigherthanthepresetsignificancelevel.Defaultis
0.05#Adj-Rsquared-summarydata_lni$adj.r.squared##Definethesequenceofvariable#varjist-namesdata^var_seq-seqlengthvar_list#Filterallrowinthereturneddatqframewhichpartiallymatchvariablenamesinoriginaldataframe#next_bn_var-sapplyvar_seqfunctionxcombnvarjistx#RemoveallvariableswithnomatchffUherewedealwithcharacterOtypeofvariablewhichisdifferentfromNULL.#Wefilterifbymatchingthelength==0Lnext_lm_var-next_lm_var[!sapplynext_lm_varfunctionxlengthx==0LJ#Updatingthenextlistofvariabletoheusedforlinearmodelingnext_var_list-namesnext_lm_var[namesnext_lni_var!=res{jonse_vanablej#Checkingifpreviousandcurrentvariablelistarethesameprintpastecriteria.isnotyetsupported*#ComputingBICmodelselectionoptionifselected}elseiftolowercriteria%in%bic{n-nrowna.omitidatabic_lm-stepAICdata」m.direclionbackwardklogntrace0data_lni-bic」m#settingcounttoItoexitthewhileloopcount-I#ComputingAICmodelselectionoptionifselected}elseiftolowcrtcriteria%in%aic{n-nrowna.omitidataaic」m-stepAICdata_lm.direction=backwardk=n.trace=0dataIm-aicIm#settingcountto1toexitthewhileloopcount-1#Exitifrelatedmodelselectionisselected}else{printCWrongOption.SupportedoptionsareAICBICorPvalue#selfingcountto/toexhthewhileloopcount-I}returndata_lmdata-prepare_daladala=ames_trainI#finaljmodel_P\alue-model_selectiondata=dataresponse_yariable=pricecriteria=pvaluesignificance=
0.05final_modcl_BIC-model_selectiondata=data.rcsponsc_variablc=pricecriteria=BIC#Jinal_model_AIC-model_selectiondata=dataresponse_variable=pricecriteria=ACffprintFinalmodelbasedonP-Value#summaryfinal_inodelJPvalueprintFinalmodelbasedonBIC##[11FinalmodelbasedonBICsummary!final_model_BIC####Call:#lmformuia=logprice〜area+Lot.Area+Overall.Qual+Overall.Cond+#Year.Built+Year.Remod.Add+BsmtFin.SF.i+BsmtFin.SF.2+#Bsmt.Unf.SF+Bedroom.AbvGr+Fireplaces+Garage.Cars+#MS.Zoning+Condition.2+Bldg.Type+Exter.Qual+Exter.Cond+##Exter.QualTA-
1.266e-
012.992e-02-
4.
2332.52e-05***##Exter.CondFa-L29e-
017.521e-02-
1.
6080.108264##Exter.CondGd
1.976e-
026.673e-
020.
2960.767240##Exter.CondTA
4.762e-
026.620e-
020.
7190.472122##CentraLAirY
8.643e-
022.297e-
023.
7620.000179***。