티스토리 뷰
INTRO
1999년 미국 회사의 CEO와 성과 자료
1. 표의 빈칸에 해당하는 기초 통계량을 구하시오.
2. profits에 따른 CEO의 salary에 대한 산점도를 그리고 상관계수를 구하시오.
3. profits에 따른 CEO의 salary에 대한 단순선형회귀식을 구하고 분석하시오.
4. profits과 age, sales에 따른 CEO의 salary의 회귀식을 구하고 분석하시오.
5. profit, age, sales, tenure, assets에 따른 CEO의 salary의 회귀식을 구하고 분석하시오.
6. 후진제거법으로 최적의 회귀식을 찾으시오. 단, salary를 종속변수로 한다.
7. 전진선택법으로 최적의 회귀식을 찾으시오. 단, salary를 종속변수로 한다.
8. 단계적방법으로 최적의 회귀식을 찾으시오. 단, salary를 종속변수로 한다.
변수의 이름과 의미는 다음과 같다.
salary = 1999 salary + bonuses
tenure = number of years as CEO (=0 if less than 6 months)
age= age of CEO
sales = total 1998 sales revenue of firm i
profits = 1998 profits for firm i
assets = total assets of firm i in 1998
firm | salary | tenure | age | sales | profits | assets |
1 | 3030 | 7 | 61 | 161315 | 2956 | 257389 |
2 | 6050 | 0 | 51 | 144416 | 22071 | 237545 |
3 | 3571 | 11 | 63 | 139208 | 4430 | 49271 |
4 | 3300 | 6 | 60 | 100697 | 6370 | 92630 |
5 | 0 | 18 | 63 | 100469 | 9296 | 355935 |
6 | 9375 | 6 | 57 | 81667 | 6328 | 86100 |
7 | 9525 | 15 | 60 | 76431 | 5807 | 668641 |
8 | 5000 | 5 | 61 | 57813 | 5372 | 59920 |
9 | 999 | 3 | 57 | 56154 | 1120 | 36672 |
10 | 3300 | 2 | 60 | 53588 | 6398 | 59550 |
11 | 3500 | 16 | 63 | 50777 | 5165 | 617679 |
12 | 2493 | 5 | 61 | 47678 | 1704 | 42754 |
13 | 1911 | 7 | 58 | 47061 | 2945 | 33673 |
14 | 2130 | 4 | 59 | 41322 | 1048 | 37675 |
15 | 1185 | 0 | 56 | 37154 | 3780 | 30966 |
16 | 5236 | 2 | 60 | 35853 | 1259 | 299804 |
17 | 1990 | 4 | 60 | 33674 | 568 | 14166 |
18 | 6000 | 32 | 74 | 33296 | 3765 | 19166 |
19 | 6229 | 5 | 63 | 32379 | 3782 | 194398 |
20 | 1523 | 3 | 56 | 31707 | 578 | 3665875 |
statistics | ||||||
mean | ||||||
variance | ||||||
standard deviation |
||||||
median | ||||||
max | ||||||
min |
1. 표의 빈칸에 해당하는 기초 통계량을 구하시오
Ex_firm_ceo.R
rm(list=ls())
library(data.table)
data_csv <- read.csv("~/Rcoding/firm_ceo.csv")
datas <- data_csv[1:20,2:7]
## 1.
datas_sum = summary(datas)
str(datas_sum)
# test for each col.
salary_mean = mean(datas$salary)
salary_mean
salary_var = var(datas$salary)
salary_var
salary_sd = sd(datas$salary)
salary_sd
ex_sol = data.frame(matrix(nrow=6,ncol=6))
for (i in 1:ncol(datas)) {
data_mean = mean(datas[,i])
data_var = var(datas[,i])
data_sd = sd(datas[,i])
data_median = median(datas[,i])
data_max = max(datas[,i])
data_min = min(datas[,i])
ex_sol[,i] <- c(data_mean,data_var,data_sd,data_median,data_max,data_min)
}
colnames(ex_sol) <- c("salary","tenure","age","sales","profits","assets")
rownames(ex_sol) <- c("mean","var","sd","median","max","min")
print(ex_sol)
출력결과
> rm(list=ls())
> library(data.table)
> data_csv <- read.csv("~/Rcoding/firm_ceo.csv")
> datas <- data_csv[1:20,2:7]
> ## 1.
> datas_sum = summary(datas)
> str(datas_sum)
'table' chr [1:6, 1:6] "Min. : 0 " "1st Qu.:1970 " ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:6] "" "" "" "" ...
..$ : chr [1:6] " salary" " tenure" " age" " sales" ...
> # test for each col.
> salary_mean = mean(datas$salary)
> salary_mean
[1] 3817.35
> salary_var = var(datas$salary)
> salary_var
[1] 6844228
> salary_sd = sd(datas$salary)
> salary_sd
[1] 2616.148
> ex_sol = data.frame(matrix(nrow=6,ncol=6))
> for (i in 1:ncol(datas)) {
+ data_mean = mean(datas[,i])
+ data_var = var(datas[,i])
+ data_sd = sd(datas[,i])
+ data_median = medi .... [TRUNCATED]
> colnames(ex_sol) <- c("salary","tenure","age","sales","profits","assets")
> rownames(ex_sol) <- c("mean","var","sd","median","max","min")
> print(ex_sol)
salary tenure age sales profits assets
mean 3817.350 7.550000 60.150000 6.813295e+04 4737.100 3.429905e+05
var 6844228.029 58.576316 19.502632 1.645457e+09 22255536.516 6.486447e+11
sd 2616.148 7.653517 4.416178 4.056423e+04 4717.577 8.053848e+05
median 3300.000 5.000000 60.000000 5.218250e+04 3781.000 7.301000e+04
max 9525.000 32.000000 74.000000 1.613150e+05 22071.000 3.665875e+06
min 0.000 0.000000 51.000000 3.170700e+04 568.000 1.416600e+04
2. profits에 따른 CEO의 salary에 대한 산점도를 그리고 상관계수를 구하시오.
Ex_firm_ceo.R
...
## 2.
par(mfrow=c(1,1))
plot(data$profits,data$salary)
cor(data$profits,data$salary)
cor(datas)
출력결과
> ##2.
> par(mfrow=c(1,1))
> plot(data$profits,data$salary)
> cor(data$profits,data$salary)
[1] 0.3019593
> cor(datas)
salary tenure age sales profits
salary 1.00000000 0.19478167 0.09391173 0.11350054 0.30195934
tenure 0.19478167 1.00000000 0.83363621 0.01027995 -0.01510477
age 0.09391173 0.83363621 1.00000000 -0.18422778 -0.32539266
sales 0.11350054 0.01027995 -0.18422778 1.00000000 0.57055319
profits 0.30195934 -0.01510477 -0.32539266 0.57055319 1.00000000
assets -0.12045382 -0.06564463 -0.20934915 -0.16217834 -0.14368491
assets
salary -0.12045382
tenure -0.06564463
age -0.20934915
sales -0.16217834
profits -0.14368491
assets 1.00000000
3. profits에 따른 CEO의 salary에 대한 단순선형회귀식을 구하고 분석하시오.
Ex_firm_ceo.R
...
## 3.
model_sr = lm(salary ~ profits, data=data)
model_sr
summary(model_sr)
par(mfrow=c(2,2))
plot(model_sr)
출력결과
...
> ## 3.
> model = lm(salary ~ profits, .... [TRUNCATED]
> model
Call:
lm(formula = salary ~ profits, data = data)
Coefficients:
(Intercept) profits
3024.1107 0.1675
> summary(model)
Call:
lm(formula = salary ~ profits, data = data)
Residuals:
Min 1Q Median 3Q Max
-4580.7 -1246.4 -730.4 1307.5 5528.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3024.1107 822.6289 3.676 0.00173 **
profits 0.1675 0.1246 1.344 0.19570
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2562 on 18 degrees of freedom
Multiple R-squared: 0.09118, Adjusted R-squared: 0.04069
F-statistic: 1.806 on 1 and 18 DF, p-value: 0.1957
> par(mfrow=c(2,2))
> plot(model)
4.profits과 age, sales에 따른 CEO의 salary의 회귀식을 구하고 분석하시오.
R console> install.packages("broom")
Ex_firm_ceo.R
...
## 4.
model_ex = lm(salary ~ age+sales+profits,data=data)
model_ex
summary(model_ex)
para_model_ex = c('salary','age','sales','profits')
coef_model_ex = paste(model_ex$coef)
para_model_ex
coef_model_ex
reg_eq = function(para,coef) {
text1 = paste(para_model_ex[1], " = ", model_ex$coef[1])
text2 = paste0(" + ", model_ex$coef[2], "*", para_model_ex[2])
text3 = paste0(" + ", model_ex$coef[3], "*", para_model_ex[3])
text4 = paste0(" + ", model_ex$coef[4], "*", para_model_ex[4])
return(paste(text1,text2,text3,text4))
}
reg_eq(para_model_ex,coef_model_ex)
statics_model_ex = broom::glance(model_ex)[,4:5]
statics_model_ex$statistic
print(paste("F-statistic:",statics_model_ex$statistic,"p-value:",statics_model_ex$p.value))
출력결과
...
> ## 4.
> model_ex = lm(salary ~ age+sales+profits,data=data)
> model_ex
Call:
lm(formula = salary ~ age + sales + profits, data = data)
Coefficients:
(Intercept) age sales profits
-4.569e+03 1.274e+02 -5.650e-03 2.340e-01
> summary(model_ex)
Call:
lm(formula = salary ~ age + sales + profits, data = data)
Residuals:
Min 1Q Median 3Q Max
-5064.4 -1005.6 -762.7 553.4 5663.2
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.569e+03 9.034e+03 -0.506 0.620
age 1.274e+02 1.454e+02 0.876 0.394
sales -5.650e-03 1.823e-02 -0.310 0.761
profits 2.340e-01 1.629e-01 1.436 0.170
Residual standard error: 2647 on 16 degrees of freedom
Multiple R-squared: 0.1377, Adjusted R-squared: -0.02403
F-statistic: 0.8514 on 3 and 16 DF, p-value: 0.4861
> para_model_ex = c('salary','age','sales','profits')
> coef_model_ex = paste(model_ex$coef)
> para_model_ex
[1] "salary" "age" "sales" "profits"
> coef_model_ex
[1] "-4569.38405752631" "127.403469383454" "-0.00565029360758603"
[4] "0.233979976348927"
> # paste("y=",model_ex$coef[1],"x + ",model_ex$coef[2])
> reg_eq = function(para,coef) {
+ text1 = paste(para_model_ex[1], " = ", model_ex$coef[1]) .... [TRUNCATED]
> reg_eq(para_model_ex,coef_model_ex)
[1] "salary = -4569.38405752631 + 127.403469383454*age + -0.00565029360758603*sales + 0.233979976348927*profits"
> statics_model_ex = broom::glance(model_ex)[,4:5]
> statics_model_ex$statistic
value
0.8513706
> print(paste("F-statistic:",statics_model_ex$statistic,"p-value:",statics_model_ex$p.value))
[1] "F-statistic: 0.851370582805658 p-value: 0.486064045102265"
5. profit, age, sales, tenure, assets에 따른 CEO의 salary의 회귀식을 구하고 분석하시오.
Ex_firm_ceo.R
...
## 5.
model_ex2 = lm(salary ~ profits+age+sales+tenure+assets,data=data)
summary(model_ex2)
para_model_ex2 = c('salary','profits','age','sales','tenure')
coef_model_ex2 = paste(model_ex2$coef)
para_model_ex2
coef_model_ex2
reg_eq = function(para,coef) {
text1 = paste(para[1], " = ", coef[1])
text2 = paste0(" + ", coef[2], "*", para[2])
text3 = paste0(" + ", coef[3], "*", para[3])
text4 = paste0(" + ", coef[4], "*", para[4])
text5 = paste0(" + ", coef[5], "*", para[5])
text6 = paste0(" + ", coef[6], "*", para[6])
return(paste(text1,text2,text3,text4,text5,text6))
}
reg_eq(para_model_ex2,coef_model_ex2)
statics_model_ex2 = broom::glance(model_ex2)[,4:5]
print(paste("F-statistic:",statics_model_ex2$statistic,"p-value:",statics_model_ex2$p.value))
출력결과
...
> ## 5.
> model_ex2 = lm(salary ~ profits+age+sales+tenure+assets,data=data)
> summary(model_ex2)
Call:
lm(formula = salary ~ profits + age + sales + tenure + assets,
data = data)
Residuals:
Min 1Q Median 3Q Max
-5173.6 -1006.6 -626.4 531.6 5462.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.295e+02 2.126e+04 0.030 0.977
profits 2.069e-01 1.989e-01 1.040 0.316
age 3.887e+01 3.611e+02 0.108 0.916
sales -6.346e-03 1.961e-02 -0.324 0.751
tenure 4.881e+01 1.899e+02 0.257 0.801
assets -1.939e-04 9.166e-04 -0.212 0.835
Residual standard error: 2822 on 14 degrees of freedom
Multiple R-squared: 0.1426, Adjusted R-squared: -0.1636
F-statistic: 0.4658 on 5 and 14 DF, p-value: 0.7954
> para_model_ex2 = c('salary','profits','age','sales','tenure')
> coef_model_ex2 = paste(model_ex2$coef)
> para_model_ex2
[1] "salary" "profits" "age" "sales" "tenure"
> coef_model_ex2
[1] "629.535762665344" "0.206864514513282" "38.8728334629791"
[4] "-0.00634580332997101" "48.8144166758324" "-0.000193927067287707"
> reg_eq = function(para,coef) {
+ text1 = paste(para[1], " = ", coef[1])
+ text2 = paste0(" + ", coef[2], "*", para[2])
+ text3 = paste0(" + ", .... [TRUNCATED]
> reg_eq(para_model_ex2,coef_model_ex2)
[1] "salary = 629.535762665344 + 0.206864514513282*profits + 38.8728334629791*age + -0.00634580332997101*sales + 48.8144166758324*tenure + -0.000193927067287707*NA"
> statics_model_ex2 = broom::glance(model_ex2)[,4:5]
> print(paste("F-statistic:",statics_model_ex2$statistic,"p-value:",statics_model_ex2$p.value))
[1] "F-statistic: 0.465845043891575 p-value: 0.795374324895597"
6. 후진제거법으로 최적의 회귀식을 찾으시오. 단, salary를 종속변수로 한다.
Ex_firm_ceo.R
...
## 6.
step_backward_model = step(lm(salary ~ tenure+age+sales+profits+assets,datas), scope=list(lower=~1,upper=~tenure+age+sales+profits+assets), direction="backward")
step_backward_model
summary(step_backward_model)
출력결과
...
> ## 6.
> step_backward_model = step(lm(salary ~ tenure+age+sales+profits+assets,datas), scope=list(lower=~1,upper=~tenure+age+sales+profits+assets), .... [TRUNCATED]
Start: AIC=322.67
salary ~ tenure + age + sales + profits + assets
Df Sum of Sq RSS AIC
- age 1 92281 111583466 320.69
- assets 1 356490 111847674 320.74
- tenure 1 526022 112017206 320.77
- sales 1 833628 112324812 320.82
- profits 1 8617106 120108290 322.16
<none> 111491184 322.67
Step: AIC=320.69
salary ~ tenure + sales + profits + assets
Df Sum of Sq RSS AIC
- assets 1 676425 112259890 318.81
- sales 1 892500 112475966 318.85
- tenure 1 4986238 116569704 319.57
- profits 1 10828227 122411692 320.54
<none> 111583466 320.69
Step: AIC=318.81
salary ~ tenure + sales + profits
Df Sum of Sq RSS AIC
- sales 1 754779 113014669 316.95
- tenure 1 5257207 117517097 317.73
- profits 1 11229913 123489803 318.72
<none> 112259890 318.81
Step: AIC=316.95
salary ~ tenure + profits
Df Sum of Sq RSS AIC
- tenure 1 5168658 118183328 315.84
<none> 113014669 316.95
- profits 1 12091946 125106615 316.98
Step: AIC=315.84
salary ~ profits
Df Sum of Sq RSS AIC
- profits 1 11857005 130040333 315.75
<none> 118183328 315.84
Step: AIC=315.75
salary ~ 1
> step_backward_model
Call:
lm(formula = salary ~ 1, data = datas)
Coefficients:
(Intercept)
3817
> summary(step_backward_model)
Call:
lm(formula = salary ~ 1, data = datas)
Residuals:
Min 1Q Median 3Q Max
-3817.3 -1847.1 -517.3 1609.7 5707.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3817 585 6.526 3e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2616 on 19 degrees of freedom
7. 전진선택법으로 최적의 회귀식을 찾으시오. 단, salary를 종속변수로 한다.
Ex_firm_ceo.R
...
## 7.
step_forward_model = step(lm(salary ~ tenure+age+sales+profits+assets,datas), scope=list(lower=~1,upper=~tenure+age+sales+profits+assets), direction="forward")
step_forward_model
summary(step_forward_model)
출력결과
...
> ## 7.
> step_forward_model = step(lm(salary ~ tenure+age+sales+profits+assets,datas), scope=list(lower=~1,upper=~tenure+age+sales+profits+assets), d .... [TRUNCATED]
Start: AIC=322.67
salary ~ tenure + age + sales + profits + assets
> step_forward_model
Call:
lm(formula = salary ~ tenure + age + sales + profits + assets,
data = datas)
Coefficients:
(Intercept) tenure age sales profits assets
6.295e+02 4.881e+01 3.887e+01 -6.346e-03 2.069e-01 -1.939e-04
> summary(step_forward_model)
Call:
lm(formula = salary ~ tenure + age + sales + profits + assets,
data = datas)
Residuals:
Min 1Q Median 3Q Max
-5173.6 -1006.6 -626.4 531.6 5462.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.295e+02 2.126e+04 0.030 0.977
tenure 4.881e+01 1.899e+02 0.257 0.801
age 3.887e+01 3.611e+02 0.108 0.916
sales -6.346e-03 1.961e-02 -0.324 0.751
profits 2.069e-01 1.989e-01 1.040 0.316
assets -1.939e-04 9.166e-04 -0.212 0.835
Residual standard error: 2822 on 14 degrees of freedom
Multiple R-squared: 0.1426, Adjusted R-squared: -0.1636
F-statistic: 0.4658 on 5 and 14 DF, p-value: 0.7954
8. 단계적방법으로 최적의 회귀식을 찾으시오. 단, salary를 종속변수로 한다
Ex_firm_ceo.R
...
## 8.
step_stepwise_model = step(lm(salary ~ tenure+age+sales+profits+assets,data), scope=list(lower=~1,upper=~tenure+age+sales+profits+assets), direction="both")
step_stepwise_model
summary(step_stepwise_model)
출력결과
...
> ## 8.
> step_stepwise_model = step(lm(salary ~ tenure+age+sales+profits+assets,data), scope=list(lower=~1,upper=~tenure+age+sales+profits+assets), d .... [TRUNCATED]
Start: AIC=322.67
salary ~ tenure + age + sales + profits + assets
Df Sum of Sq RSS AIC
- age 1 92281 111583466 320.69
- assets 1 356490 111847674 320.74
- tenure 1 526022 112017206 320.77
- sales 1 833628 112324812 320.82
- profits 1 8617106 120108290 322.16
<none> 111491184 322.67
Step: AIC=320.69
salary ~ tenure + sales + profits + assets
Df Sum of Sq RSS AIC
- assets 1 676425 112259890 318.81
- sales 1 892500 112475966 318.85
- tenure 1 4986238 116569704 319.57
- profits 1 10828227 122411692 320.54
<none> 111583466 320.69
+ age 1 92281 111491184 322.67
Step: AIC=318.81
salary ~ tenure + sales + profits
Df Sum of Sq RSS AIC
- sales 1 754779 113014669 316.95
- tenure 1 5257207 117517097 317.73
- profits 1 11229913 123489803 318.72
<none> 112259890 318.81
+ assets 1 676425 111583466 320.69
+ age 1 412217 111847674 320.74
Step: AIC=316.95
salary ~ tenure + profits
Df Sum of Sq RSS AIC
- tenure 1 5168658 118183328 315.84
<none> 113014669 316.95
- profits 1 12091946 125106615 316.98
+ sales 1 754779 112259890 318.81
+ assets 1 538704 112475966 318.85
+ age 1 455187 112559482 318.87
Step: AIC=315.84
salary ~ profits
Df Sum of Sq RSS AIC
- profits 1 11857005 130040333 315.75
<none> 118183328 315.84
+ age 1 5370818 112812510 316.91
+ tenure 1 5168658 113014669 316.95
+ assets 1 788629 117394698 317.71
+ sales 1 666230 117517097 317.73
Step: AIC=315.75
salary ~ 1
Df Sum of Sq RSS AIC
<none> 130040333 315.75
+ profits 1 11857005 118183328 315.84
+ tenure 1 4933717 125106615 316.98
+ assets 1 1886771 128153561 317.46
+ sales 1 1675228 128365105 317.49
+ age 1 1146879 128893453 317.58
> step_stepwise_model
Call:
lm(formula = salary ~ 1, data = data)
Coefficients:
(Intercept)
3817
> summary(step_stepwise_model)
Call:
lm(formula = salary ~ 1, data = data)
Residuals:
Min 1Q Median 3Q Max
-3817.3 -1847.1 -517.3 1609.7 5707.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3817 585 6.526 3e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2616 on 19 degrees of freedom
'R' 카테고리의 다른 글
(macOS)[R] 시계열 예측 : 실습 - 2 (0) | 2022.04.19 |
---|---|
(macOS)[R] 시계열 예측 : 실습 - 1 (0) | 2022.04.18 |
(macOS)[R] 시계열 예측 : 이론 (0) | 2022.04.18 |
(macOS)[R] 다변량 분석 : 상관 분석(Correlation Analysis) - 4 (0) | 2022.04.17 |
(macOS)[R] 다변량 분석 : 상관 분석(Correlation Analysis) - 3 (0) | 2022.04.16 |
- Total
- Today
- Yesterday
- Templates
- SSH
- Pandas
- raspberrypi
- ERP
- COVID-19
- 라즈베리파이
- DS18B20
- Django
- vscode
- r
- Regression
- arduino
- DAQ
- analysis
- sublime text
- server
- 자가격리
- Raspberry Pi
- github
- CSV
- 코로나
- MacOS
- Model
- pyserial
- Python
- git
- 코로나19
- template
- 확진
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |