(macOS)[python] curve fitting : (2) interpolation with scipy

티스토리 뷰

python/Data Science

(macOS)[python] curve fitting : (2) interpolation with scipy

jinozpersona 2022. 11. 4. 14:00

Intro

curve fitting : 커브피팅, 곡선적합, 곡선근사, 곡선피팅, ....

- polynomial regression : 다항식 회귀

- polynomial interpolation : 다항식 보간

curve fitting은 데이터 근사에 사용된다. 오차를 가지는 데이터의 통계적 근사를 regression, 데이터 포인트를 통과하는 곡선을 수치적으로 나타내는 것을 interpolation이라 간단하게 구분할 수 있다. regression, 즉 회귀의 경우는 통계적 추정에 따라 회귀 적합도를 나타내는 R-square(결정계수)이라는 총변동합의 비율로 근사정도를 판단한다.

총 2개의 포스트로 구성되어있다.

(1) polynomial regression : scipy, numpy

(2) polynomial interpolation : scipy

이번 포스트에서는 python library scipy를 이용한 다항식 보간(polymonial interpolation)을 다룬다.

Requirements

- Editor : vscode

- python 3.10

- scipy

1. 이산화탄소(CO2, Carbon Dioxide) 해리상수 pKa1, pKa2

해리될 수 있는 양성자를 2개 이상 포함하는 산을 다양성자산이라하는데 CO2가 대표적이며 물에 용해되어 약산이 되며 2번의 해리 상수(dissociation constants)를 나타낸 표이다.

데이터출처 : chegg.com

Equilibrium Constants for the Carbonate System

curve fitting 중 regression에 필요한 독립변수를 T(oC)로 두고 pKa1, pKa2를 종속변수로 두었다.

pka1 = -log10[Ka1]과 같으며, python code에서는 T와 K1, K2로 두고 분석하였다.

## data폴더를 하위 폴더로 하여 저장

./data/solubility_data.xlsx

2. polynomial interpolation : scipy

curvefit_interpolation_with_scipy.py

import os
from openpyxl import load_workbook as loadwb
import matplotlib.pyplot as plt
from scipy import interpolate as spi


# data path, name, sheet name
fpath = './data/'
fname = 'solubility_data.xlsx'
sh_names = ('pK_CO2','pKH_CO2')

# read excel file
wb = loadwb(os.path.join(fpath,fname), data_only=True)
ws = wb[sh_names[0]]
wb.close()

# cell location
T_rng = ws['A3':'A11']
pK1_rng = ws['B3':'B11']
pK2_rng = ws['C3':'C11']

## convert cell(tuple) to value
def cell2value(rng):
    v = []
    for rng_tuple in rng:
        for cell in rng_tuple:
            v.append(cell.value)
    return v
T = cell2value(T_rng)
K1 = cell2value(pK1_rng)
K2 = cell2value(pK2_rng)

# scipy interpolate
# spline interpolation : piecewise polynomial interpolation
# linear, cubic

# f1: linear, cubic
f1_linear = spi.interp1d(T,K1,kind='linear')
f1_cubic  = spi.interp1d(T,K1,kind='cubic')

# f2: linear, cubic
f2_linear = spi.interp1d(T,K2,kind='linear')
f2_cubic  = spi.interp1d(T,K2,kind='cubic')

# origin/interpolation datas
print(K1)
print(f1_linear(T))
print(f1_cubic(T))
print(K2)
print(f2_linear(T))
print(f2_cubic(T))

# define figure
fig = plt.figure(figsize=(10,8))

# subplot (1,1)
ax1 = fig.add_subplot(2,2,1)
plt.plot(T,K1,'ro-',label='data')
plt.title("origin datas: -logK1 by T")
plt.xlabel("Temp.[oC]")
plt.ylabel("-logK1")
plt.grid()
plt.legend()

# subplot (1,2)
ax2 = fig.add_subplot(2,2,2)
plt.plot(T,f1_linear(T),'g*-',label='linear')
plt.plot(T,f1_cubic(T),'b^-',label='cubic')
plt.title("interpolation: -logK1 by T")
plt.xlabel("Temp.[oC]")
plt.ylabel("-logK1")
plt.grid()
plt.legend()

# subplot (2,1)
ax3 = fig.add_subplot(2,2,3)
plt.plot(T,K2,'ro-',label='data')
plt.title("origin datas: -logK2 by T")
plt.xlabel("Temp.[oC]")
plt.ylabel("-logK2")
plt.grid()
plt.legend()

# subplot (2,2)
ax4 = fig.add_subplot(2,2,4)
plt.plot(T,f2_linear(T),'g*-',label='linear')
plt.plot(T,f2_cubic(T),'b^-',label='cubic')
plt.title("interpolation: -logK2 by T")
plt.xlabel("Temp.[oC]")
plt.ylabel("-logK2")
plt.grid()
plt.legend()

plt.show()

interpolation은 origin data point를 지나도록 나타내는 방법으로 여기서는 linear, cubic 같은 결과를 보인다.

독립변수 0~60 구간 내 누락된 35, 40, 45 등의 값은 내삽을 통해 계산할 수 있다.

저작자표시 비영리 변경금지 (새창열림)

'python > Data Science' 카테고리의 다른 글

(macOS)[python] curve fitting : (1) regression with scipy/numpy (0)	2022.11.03
(macOS)[python] 공공데이터포탈 data.go.kr OpenAPI 활용 (0)	2022.10.13
(macOS)[python] 파일이름 encoding, 한글 자모분리 해결 (0)	2022.02.27
(macOS)[python] 기상청 RSS 서비스를 이용한 데이터 parsing (0)	2021.02.15
(macOS)[python] IP address 확인 : Public/Pravate(Virtual) (0)	2021.02.15

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

글 보관함

페르소나

티스토리 뷰