티스토리 뷰

Intro

 - Editor : sublime text3

 - python 3.9.1

 - pandas 1.2.1 

 - xlrd 2.0.1

 - openpyxl 3.0.5

 

pandas 기반 통계분석 statsmodel module 사용법 사이트

www.statsmodels.org/stable/index.html

 

Data

Time Depth3
2 0.050
5 0.150
8 0.270
12 0.36
15 0.47
17 0.52

 

Directory Tree

.

├── RegressionAnalysis_Engine.py

├── data

   ├── test1.csv

   └── test1.xlsx

 

RegressionAnalysis_Engine.py

# -*- coding: utf-8 -*-

import os
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import warnings


#### dir base
fpath_base = './data'

#### file_name, sheet_name
fnames = ('test1.csv','test1.xlsx')
shnames = ('Sheet1','Sheet2','Sheet3')

#### file_dir_base
fpath_bases = [os.path.join(fpath_base,fname) for fname in fnames]


#### dataframe read
df_csv = pd.read_csv(fpath_bases[0])
df1 = pd.read_excel(fpath_bases[1],sheet_name=shnames[0])
df2 = pd.read_excel(fpath_bases[1],sheet_name=shnames[1])
df3 = pd.read_excel(fpath_bases[1],sheet_name=shnames[2])


#### DataFrame Info
print("DataFrame Info for .csv   \n {} \n".format(df_csv))
print("File type \n {} \n".format(type(df_csv)))
print("DataFrame info for .xlsx  \n {} \n".format(df1))
print("File type \n {} \n".format(type(df1)))

#### Pandas Data Handle
print("1. Index    Info is \n {} \n".format(df1.index))
print("2. Columns  Info is \n {} \n".format(df1.columns))
print("3. Column-1 Data is \n {} \n".format(df1.loc[0:,['Time']]))
print("4. Column-2 Data is \n {} \n".format(df1.loc[0:,['Depth3']]))


#### statsmodels Handel
# formula='dependent variable ~ independent variable'
model = smf.ols(formula='Depth3 ~ Time',  data=df1)
results = model.fit()
warnings.filterwarnings('ignore')

# Simple Regression Analysis Chart
print(results.summary(),'\n\n\n')
# # intercept & coeficient(time)
# print(results.params)
# # R-squared
# print(results.rsquared)

# predict X(Time)
y_target = 0.65
x_predict = (y_target - results.params[0])/results.params[1]
print("->> Predict Value 'Time[min]' is '{:3.2f}' for H/E Depth[mm] = {}\n".format(x_predict,y_target))
# predict Y(Depth3)
# x_new = {'Time':[10,16,18,20]}
# y_new = results.predict(x_new)


#### statsmodels Plot
fig1 = plt.figure(figsize=(8, 6))
sm.graphics.plot_regress_exog(results, 'Time', fig=fig1)
plt.show()

 - predict X : Target Y에 대한 독립변수(실험변수) 예측 시 1차 선형 회귀식을 산술적으로 역산할 수 있음

 - predict Y : Target X에 대한 종속변수(Output)   예측 시 X를 dictionary 변수와 .predic method를 사용하여 값을 얻을 수 있음

 

실행결과 : ----*REPL*[python]----

DataFrame Info for .csv   
    Time  Depth3
0     2    0.05
1     5    0.15
2     8    0.27
3    12    0.36
4    15    0.47
5    17    0.52 

File type 
 <class 'pandas.core.frame.DataFrame'> 

DataFrame info for .xlsx  
    Time  Depth3
0     2    0.05
1     5    0.15
2     8    0.27
3    12    0.36
4    15    0.47
5    17    0.52 

File type 
 <class 'pandas.core.frame.DataFrame'> 

1. Index    Info is 
 RangeIndex(start=0, stop=6, step=1) 

2. Columns  Info is 
 Index(['Time', 'Depth3'], dtype='object') 

3. Column-1 Data is 
    Time
0     2
1     5
2     8
3    12
4    15
5    17 

4. Column-2 Data is 
    Depth3
0    0.05
1    0.15
2    0.27
3    0.36
4    0.47
5    0.52 

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 Depth3   R-squared:                       0.995
Model:                            OLS   Adj. R-squared:                  0.994
Method:                 Least Squares   F-statistic:                     777.8
Date:                Fri, 15 Jan 2021   Prob (F-statistic):           9.83e-06
Time:                        13:08:07   Log-Likelihood:                 18.062
No. Observations:                   6   AIC:                            -32.12
Df Residuals:                       4   BIC:                            -32.54
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.0031      0.013     -0.247      0.817      -0.038       0.032
Time           0.0312      0.001     27.889      0.000       0.028       0.034
==============================================================================
Omnibus:                          nan   Durbin-Watson:                   2.779
Prob(Omnibus):                    nan   Jarque-Bera (JB):                1.235
Skew:                           1.108   Prob(JB):                        0.539
Kurtosis:                       2.843   Cond. No.                         23.6
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified. 



->> Predict Value 'Time[min]' is '20.96' for H/E Depth[mm] = 0.65

 - OLS Regression Results : R-squared = 0.995, 95.5% 유의미한 결과

                                            Prob (F-statistic) = 9.83e-06, P-value 0.05이하로 통계적 의의를 가짐

                                            coef 칼럼의 Intercept 항은 계산된 회귀식에서 1차 선형의 상수항에 해당

                                            coef 칼럼의 Time 항은 계산된 회귀식에서 1차 선형의 독립변수(x)의 계수에 해당

 

실행결과 : Plot

반응형
댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2024/07   »
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31
글 보관함