(macOS)[R] 기초 통계 분석 : 회귀 분석(Regression Analysis)

티스토리 뷰

(macOS)[R] 기초 통계 분석 : 회귀 분석(Regression Analysis) - 4

jinozpersona 2022. 4. 13. 23:47

INTRO

1. 회귀분석이란?

2. 단순회귀분석

3. 다중회귀분석(중회귀분석)

4. 다항회귀분석

회귀분석 방법 : summary

- Residuals(잔차)

- Coefficients(회귀계수)

- 모델 적합도 : Multiple R-squared, Adjusted R-squared, F-statistic, p-value

다항회귀분석(Polynomial Regression Analysis)

dataset

x	y
1	5
2	3
3	2
4	3
5	4
6	6
7	10
8	12
9	18

1단계

Estimate regression model

- df1 data plot으로 독립변수 x와 종속변수 y는 2차식의 형태를 나타낸다.

- 선형 회귀 모델 진단 : 이미 결과는 알고 있지만 참고

test_regression_4.R

rm(list=ls())
setwd = "~/Rcoding"

## dataset
x = seq(1,9)
y = c(5,3,2,3,4,6,10,12,18)

## Estimate regression model
df1 = data.frame(x,y)
df1
plot(df1)

model_1 = lm(y ~ x,data=df1)
model_1
summary(model_1)
par(mfrow=c(2,2))
plot(model_1)

출력결과

> source("~/R_coding/test_regression_4_1.R", echo=TRUE)

> rm(list=ls())

> setwd = "~/Rcoding"

> ## dataset
> x = seq(1,9)

> y = c(5,3,2,3,4,6,10,12,18)

> ## Estimate regression model
> df1 = data.frame(x,y)

> df1
  x  y
1 1  5
2 2  3
3 3  2
4 4  3
5 5  4
6 6  6
7 7 10
8 8 12
9 9 18

> plot(df1)

> model_1 = lm(y ~ x,data=df1)

> model_1

Call:
lm(formula = y ~ x, data = df1)

Coefficients:
(Intercept)            x  
     -1.167        1.633  


> summary(model_1)

Call:
lm(formula = y ~ x, data = df1)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0000 -2.3667 -0.2667  0.9000  4.5333 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -1.1667     2.2296  -0.523  0.61694   
x             1.6333     0.3962   4.122  0.00445 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.069 on 7 degrees of freedom
Multiple R-squared:  0.7083,	Adjusted R-squared:  0.6666 
F-statistic: 16.99 on 1 and 7 DF,  p-value: 0.004446


> par(mfrow=c(2,2))

> plot(model_1)

Regression Eq.

$ y = -1.167 + 1.633*x $

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.1667 2.2296 -0.523 0.61694
x 1.6333 0.3962 4.122 0.00445 **
--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.069 on 7 degrees of freedom
Multiple R-squared: 0.7083, Adjusted R-squared: 0.6666
F-statistic: 16.99 on 1 and 7 DF, p-value: 0.004446

Multiple R-squared: 0.7083, Adjusted R-squared: 0.6666

추정된 회귀식이 데이터를 70.83% 설명하고 있음.

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.1667 2.2296 -0.523 0.61694
x 1.6333 0.3962 4.122 0.00445 **

- (Intercept) 통계적으로 유의하지 않음

- x 계수 p-value < 0.05

Regression Analysis

Residuals vs Fitted plot(or chart)에서 오차항은 평균이 0이고 분산이 일정하다는 가정을 만족하지 않는다.

결정계수 R2에서 70% 수준의 설명이 가능하나 새로운 변수를 적용하여 더 적합한 추정 회귀식을 찾을 필요가 있다.

2단계

Estimate regression model adjusted by 2nd order

- df1 data에 x2(x^2)을 생성하여 df2 dataset으로 2차식 회귀식을 추정해본다.

test_regression_4.R

....


## Estimate regression model adjusted by 2nd order
x2 = x^2
df2 = cbind(x2,df1)
df2
plot(df2)

model_2 = lm(y ~ x+x2,data=df2)
model_2
summary(model_2)
par(mfrow=c(2,2))
plot(model_2)
par(mfrow=c(1,1))

출력결과

....

> ## Estimate regression model adjusted by 2nd order
> x2 = x^2

> df2 = cbind(x2,df1)

> df2
  x2 x  y
1  1 1  5
2  4 2  3
3  9 3  2
4 16 4  3
5 25 5  4
6 36 6  6
7 49 7 10
8 64 8 12
9 81 9 18

> plot(df2)

> model_2 = lm(y ~ x+x2,data=df2)

> model_2

Call:
lm(formula = y ~ x + x2, data = df2)

Coefficients:
(Intercept)            x           x2  
     7.1667      -2.9121       0.4545  


> summary(model_2)

Call:
lm(formula = y ~ x + x2, data = df2)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.9606 -0.1606  0.0303  0.2242  0.9455 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  7.16667    0.78728   9.103 9.87e-05 ***
x           -2.91212    0.36149  -8.056 0.000196 ***
x2           0.45455    0.03526  12.893 1.34e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6187 on 6 degrees of freedom
Multiple R-squared:  0.9898,	Adjusted R-squared:  0.9864 
F-statistic: 292.2 on 2 and 6 DF,  p-value: 1.05e-06


> par(mfrow=c(2,2))

> plot(model_2)

> par(mfrow=c(1,1))

선형 회귀 모델 진단 그래프 : 처음 추전된 df1 대비 정규성을 나타냄

Regression Eq.

$ y = 7.1667 -2.9121*x + 0.4545*x^2 $

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.16667 0.78728 9.103 9.87e-05 ***
x -2.91212 0.36149 -8.056 0.000196 ***
x2 0.45455 0.03526 12.893 1.34e-05 ***
--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6187 on 6 degrees of freedom
Multiple R-squared: 0.9898, Adjusted R-squared: 0.9864
F-statistic: 292.2 on 2 and 6 DF, p-value: 1.05e-06

F-statistic: 292.2 on 2 and 6 DF, p-value: 1.05e-06

유의수준 5% 추정된 회귀 모델이 통계적으로 매우 유의하다.

Multiple R-squared: 0.9898, Adjusted R-squared: 0.9864

추정된 회귀식이 데이터를 98.98% 설명하고 있음.

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.16667 0.78728 9.103 9.87e-05 ***
x -2.91212 0.36149 -8.056 0.000196 ***
x2 0.45455 0.03526 12.893 1.34e-05 ***

- (Intercept) p-value < 0.001

- x 계수 p-value < 0.001

- x2 계수 p-value < 0.001

Regression Analysis

수정된 결정계수 Adjusted R-squared 0.9864로 추정된 회귀식이 98.64% 데이터를 설명함.

x2(x^2)항이 추가했을 때 그렇지 않을 때보다 회귀식의 추정이 훨씬 잘됐음을 확인할 수 있다.

저작자표시 비영리 변경금지 (새창열림)

'R' 카테고리의 다른 글

(macOS)[R] 기초 통계 분석 : 최적 회귀 방정식 선택(설명변수 선택) - 2 (0)	2022.04.15
(macOS)[R] 기초 통계 분석 : 최적 회귀 방정식 선택(설명변수 선택) - 1 (0)	2022.04.14
(macOS)[R] 기초 통계 분석 : 회귀 분석(Regression Analysis) - 3 (0)	2022.04.11
(macOS)[R] 기초 통계 분석 : 회귀 분석(Regression Analysis) - 2 (0)	2022.04.07
(macOS)[R] 기초 통계 분석 : 회귀 분석(Regression Analysis) - 1 (0)	2022.04.05

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

글 보관함

페르소나

티스토리 뷰