티스토리 뷰
(macOS)[python] Regression Analysis with Pandas
jinozpersona 2021. 1. 12. 13:25Intro
- Editor : sublime text3
- python 3.9.1
- pandas 1.2.1
- xlrd 2.0.1
- openpyxl 3.0.5
pandas 기반 통계분석 statsmodel module 사용법 사이트
www.statsmodels.org/stable/index.html
Data
Time | Depth3 |
2 | 0.050 |
5 | 0.150 |
8 | 0.270 |
12 | 0.36 |
15 | 0.47 |
17 | 0.52 |
Directory Tree
.
├── RegressionAnalysis_Engine.py
├── data
├── test1.csv
└── test1.xlsx
RegressionAnalysis_Engine.py
# -*- coding: utf-8 -*-
import os
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import warnings
#### dir base
fpath_base = './data'
#### file_name, sheet_name
fnames = ('test1.csv','test1.xlsx')
shnames = ('Sheet1','Sheet2','Sheet3')
#### file_dir_base
fpath_bases = [os.path.join(fpath_base,fname) for fname in fnames]
#### dataframe read
df_csv = pd.read_csv(fpath_bases[0])
df1 = pd.read_excel(fpath_bases[1],sheet_name=shnames[0])
df2 = pd.read_excel(fpath_bases[1],sheet_name=shnames[1])
df3 = pd.read_excel(fpath_bases[1],sheet_name=shnames[2])
#### DataFrame Info
print("DataFrame Info for .csv \n {} \n".format(df_csv))
print("File type \n {} \n".format(type(df_csv)))
print("DataFrame info for .xlsx \n {} \n".format(df1))
print("File type \n {} \n".format(type(df1)))
#### Pandas Data Handle
print("1. Index Info is \n {} \n".format(df1.index))
print("2. Columns Info is \n {} \n".format(df1.columns))
print("3. Column-1 Data is \n {} \n".format(df1.loc[0:,['Time']]))
print("4. Column-2 Data is \n {} \n".format(df1.loc[0:,['Depth3']]))
#### statsmodels Handel
# formula='dependent variable ~ independent variable'
model = smf.ols(formula='Depth3 ~ Time', data=df1)
results = model.fit()
warnings.filterwarnings('ignore')
# Simple Regression Analysis Chart
print(results.summary(),'\n\n\n')
# # intercept & coeficient(time)
# print(results.params)
# # R-squared
# print(results.rsquared)
# predict X(Time)
y_target = 0.65
x_predict = (y_target - results.params[0])/results.params[1]
print("->> Predict Value 'Time[min]' is '{:3.2f}' for H/E Depth[mm] = {}\n".format(x_predict,y_target))
# predict Y(Depth3)
# x_new = {'Time':[10,16,18,20]}
# y_new = results.predict(x_new)
#### statsmodels Plot
fig1 = plt.figure(figsize=(8, 6))
sm.graphics.plot_regress_exog(results, 'Time', fig=fig1)
plt.show()
- predict X : Target Y에 대한 독립변수(실험변수) 예측 시 1차 선형 회귀식을 산술적으로 역산할 수 있음
- predict Y : Target X에 대한 종속변수(Output) 예측 시 X를 dictionary 변수와 .predic method를 사용하여 값을 얻을 수 있음
실행결과 : ----*REPL*[python]----
DataFrame Info for .csv
Time Depth3
0 2 0.05
1 5 0.15
2 8 0.27
3 12 0.36
4 15 0.47
5 17 0.52
File type
<class 'pandas.core.frame.DataFrame'>
DataFrame info for .xlsx
Time Depth3
0 2 0.05
1 5 0.15
2 8 0.27
3 12 0.36
4 15 0.47
5 17 0.52
File type
<class 'pandas.core.frame.DataFrame'>
1. Index Info is
RangeIndex(start=0, stop=6, step=1)
2. Columns Info is
Index(['Time', 'Depth3'], dtype='object')
3. Column-1 Data is
Time
0 2
1 5
2 8
3 12
4 15
5 17
4. Column-2 Data is
Depth3
0 0.05
1 0.15
2 0.27
3 0.36
4 0.47
5 0.52
OLS Regression Results
==============================================================================
Dep. Variable: Depth3 R-squared: 0.995
Model: OLS Adj. R-squared: 0.994
Method: Least Squares F-statistic: 777.8
Date: Fri, 15 Jan 2021 Prob (F-statistic): 9.83e-06
Time: 13:08:07 Log-Likelihood: 18.062
No. Observations: 6 AIC: -32.12
Df Residuals: 4 BIC: -32.54
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -0.0031 0.013 -0.247 0.817 -0.038 0.032
Time 0.0312 0.001 27.889 0.000 0.028 0.034
==============================================================================
Omnibus: nan Durbin-Watson: 2.779
Prob(Omnibus): nan Jarque-Bera (JB): 1.235
Skew: 1.108 Prob(JB): 0.539
Kurtosis: 2.843 Cond. No. 23.6
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
->> Predict Value 'Time[min]' is '20.96' for H/E Depth[mm] = 0.65
- OLS Regression Results : R-squared = 0.995, 95.5% 유의미한 결과
Prob (F-statistic) = 9.83e-06, P-value 0.05이하로 통계적 의의를 가짐
coef 칼럼의 Intercept 항은 계산된 회귀식에서 1차 선형의 상수항에 해당
coef 칼럼의 Time 항은 계산된 회귀식에서 1차 선형의 독립변수(x)의 계수에 해당
실행결과 : Plot
'python > Numpy, Pandas' 카테고리의 다른 글
(macOS)[python] Numpy Excel 다루기 - 2 : excel .xlsx read (0) | 2021.01.22 |
---|---|
(macOS)[python] Numpy Excel 다루기 - 1 : .csv 읽기 (0) | 2021.01.13 |
(macOS)[python] Pandas Excel 다루기 - 3 : .csv 파일 읽기 (0) | 2021.01.13 |
(macOS)[python] Pandas Excel 다루기 - 1 : excel 읽기, 저장, df 분석 (0) | 2020.03.10 |
(macOS) .DS_Store 삭제 (0) | 2020.02.28 |
- Total
- Today
- Yesterday
- server
- Pandas
- raspberrypi
- COVID-19
- DAQ
- 코로나19
- 자가격리
- pyserial
- Python
- Regression
- Model
- 코로나
- sublime text
- DS18B20
- arduino
- Raspberry Pi
- r
- MacOS
- git
- CSV
- Templates
- 라즈베리파이
- 확진
- Django
- analysis
- template
- SSH
- github
- ERP
- vscode
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |