티스토리 뷰

INTRO

Office계에 Excel이 python에는 pandas가 있다.

python pandas는 통합 구조화된 데이터를 다루기에 최적화되어있으며 선택이 아닌 필수라 할 수 있다.

pandas가 제공하는 data 구조와 indexing, 연산을 다루어 본다.

 

Contents

- Data Object : Series, DataFrame

- Indexing, Operation

 

 

pandas libray가 없으면 terminal에서 pip를 이용하여 먼저 설치한다.

$ pip install pandas

 

python code : pyTest/pyPandas1.py

# -*- coding: utf-8 -*-
import datetime
import pandas as pd


today = datetime.datetime.today()
yesterday = today - datetime.timedelta(1)

data1 = [1, 3, 5, 7, 9]
data2 = [2, 4, 6, 8, 10]
data3 = [2300,6240,1500,4900,27000]
data4 = ['apple','strawberry','banana','tomato','melon']
data5 = ['DateTime','Goods','Cost(KRW)']
date  = pd.date_range(today.strftime('%Y-%m-%d-%a, %H:%M:%S'), periods=5)

s1  = pd.Series(data1)
s2  = pd.Series([data2])
s3  = pd.Series(data3,index=date,name='Cost(KRW)')
df1 = pd.DataFrame([data1,data2])
df2 = pd.DataFrame([date,data4,data3]).T
df2.columns = data5

print("<--- DateTime Print --->")
print("{}\n".format(today.strftime('%Y-%m-%d-%a, %H:%M:%S')))

print("<--- Series 's1 / s2' handle --->")
print("'s1' data-type {}".format(type(s1)))
print("'s1' data \n{}".format(s1))
print("'s1' 1st row value : {}".format(s1[0]))
print("'s1' index info : {}\n".format(s1.index))
print("'s2' dtype become the object : {}".format(type(s2)))
print("'s2' data \n{}".format(s2))
print("'s2' 1st row value : {}".format(s2[0]))
print("'s2' index info : {}".format(s2.index))
print("\n\n")

print("<--- Series s3 handle --->")
print("'s3' data-type {}".format(type(s3)))
print("'s3' data \n{}".format(s3))
print("'s3' 1st row value : {}".format(s3[0]))
print("'s3' index info : {}".format(s3.index))
print("\n\n")


print("<--- DataFrame df handle : 1 --->")
print("'df1' data-type {}".format(type(df1)))
print("'df1' data \n{}".format(df1))
print("'df1' (row,col) = (1,1) value : {}".format(df1[0][0]))
print("'df1' index info : {}".format(df1.index))
print("'df1' columns info : {}".format(df1.columns))
print("\n\n")

print("'df2' data-type {}".format(type(df2)))
print("'df2' data \n{}".format(df2))
print("'df2' (row,col) = (1,2) value : {}".format(df2.values[0][1]))
print("'df2' index info : \n{}".format(df2.index))
print("'df2' columns info : \n{}".format(df2.columns))
print("'df2' 1st row data : \n{}".format(df2.head(1)))
print("'df2' end row data : \n{}\n".format(df2.tail(1)))
# print("'df2' descriptions : {}".format(df2.describe()))
print("'df2' DateTime  ascending order(오름차순) : \n{}".format(df2.sort_index(axis=0, ascending=True)))
print("'df2' DateTime descending order(내림차순) : \n{}".format(df2.sort_index(axis=0, ascending=False)))
print("\n\n")


print("<--- DataFrame df handle : 2 --->")
df2.insert(2,'Quantity(EA)',s1,False)
df2.insert(3,'Unit Price(KRW)',df2['Cost(KRW)']/df2['Quantity(EA)'],False)
df2 = df2.astype({'Unit Price(KRW)':'int','Cost(KRW)':'int'})
print("'df2' insert new_col operation data : \n{}\n".format(df2))
new_row = [yesterday.strftime('%Y-%m-%d %H:%M:%S'),'orange',10,1000,10000]
df3 = df2.append(pd.Series(new_row,index=df2.columns), ignore_index=True)
print("'df3' append new_row data : \n{}\n".format(df3))


print("<--- DataFrame df3 indexing --->")
print("***** column indexing *****")
print("'df3' 1st col : \n{}\n".format(df3.DateTime))
print("'df3' 2nd col : \n{}\n".format(df3['Goods']))
print("'df3' 3rd~5th cols : \n{}\n".format(df3.iloc[:,2:]))

print("\n***** row indexing *****")
print("'df3' 1st row : \n{}\n".format(df3[0:1]))
print("'df3' 2st row : \n{}\n".format(df3.iloc[1]))
print("'df3' 3rd~5th rows : \n{}\n".format(df3.iloc[3:,:]))

print("\n***** values indexing *****")
print("'df3' partial select value : \n{}\n".format(df3.iloc[[0,2,3],[1,3]]))
print("'df3' select value over 2000KRW of 'Unit Price(KRW)' : \n{}\n".format(df3[df3['Unit Price(KRW)'] > 2000]))
print("\n\n")

 

---- SublimeText python Run pyPandas1.py ----

<--- DateTime Print --->
2021-03-19-Fri, 14:19:15

<--- Series 's1 / s2' handle --->
's1' data-type <class 'pandas.core.series.Series'>
's1' data 
0    1
1    3
2    5
3    7
4    9
dtype: int64
's1' 1st row value : 1
's1' index info : RangeIndex(start=0, stop=5, step=1)

's2' dtype become the object : <class 'pandas.core.series.Series'>
's2' data 
0    [2, 4, 6, 8, 10]
dtype: object
's2' 1st row value : [2, 4, 6, 8, 10]
's2' index info : RangeIndex(start=0, stop=1, step=1)



<--- Series s3 handle --->
's3' data-type <class 'pandas.core.series.Series'>
's3' data 
2021-03-19 14:19:15     2300
2021-03-20 14:19:15     6240
2021-03-21 14:19:15     1500
2021-03-22 14:19:15     4900
2021-03-23 14:19:15    27000
Freq: D, Name: Cost(KRW), dtype: int64
's3' 1st row value : 2300
's3' index info : DatetimeIndex(['2021-03-19 14:19:15', '2021-03-20 14:19:15',
               '2021-03-21 14:19:15', '2021-03-22 14:19:15',
               '2021-03-23 14:19:15'],
              dtype='datetime64[ns]', freq='D')



<--- DataFrame df handle : 1 --->
'df1' data-type <class 'pandas.core.frame.DataFrame'>
'df1' data 
   0  1  2  3   4
0  1  3  5  7   9
1  2  4  6  8  10
'df1' (row,col) = (1,1) value : 1
'df1' index info : RangeIndex(start=0, stop=2, step=1)
'df1' columns info : RangeIndex(start=0, stop=5, step=1)



'df2' data-type <class 'pandas.core.frame.DataFrame'>
'df2' data 
             DateTime       Goods Cost(KRW)
0 2021-03-19 14:19:15       apple      2300
1 2021-03-20 14:19:15  strawberry      6240
2 2021-03-21 14:19:15      banana      1500
3 2021-03-22 14:19:15      tomato      4900
4 2021-03-23 14:19:15       melon     27000
'df2' (row,col) = (1,2) value : apple
'df2' index info : 
RangeIndex(start=0, stop=5, step=1)
'df2' columns info : 
Index(['DateTime', 'Goods', 'Cost(KRW)'], dtype='object')
'df2' 1st row data : 
             DateTime  Goods Cost(KRW)
0 2021-03-19 14:19:15  apple      2300
'df2' end row data : 
             DateTime  Goods Cost(KRW)
4 2021-03-23 14:19:15  melon     27000

'df2' DateTime  ascending order(오름차순) : 
             DateTime       Goods Cost(KRW)
0 2021-03-19 14:19:15       apple      2300
1 2021-03-20 14:19:15  strawberry      6240
2 2021-03-21 14:19:15      banana      1500
3 2021-03-22 14:19:15      tomato      4900
4 2021-03-23 14:19:15       melon     27000
'df2' DateTime descending order(내림차순) : 
             DateTime       Goods Cost(KRW)
4 2021-03-23 14:19:15       melon     27000
3 2021-03-22 14:19:15      tomato      4900
2 2021-03-21 14:19:15      banana      1500
1 2021-03-20 14:19:15  strawberry      6240
0 2021-03-19 14:19:15       apple      2300



<--- DataFrame df handle : 2 --->
'df2' insert new_col operation data : 
             DateTime       Goods  Quantity(EA)  Unit Price(KRW)  Cost(KRW)
0 2021-03-19 14:19:15       apple             1             2300       2300
1 2021-03-20 14:19:15  strawberry             3             2080       6240
2 2021-03-21 14:19:15      banana             5              300       1500
3 2021-03-22 14:19:15      tomato             7              700       4900
4 2021-03-23 14:19:15       melon             9             3000      27000

'df3' append new_row data : 
              DateTime       Goods  Quantity(EA)  Unit Price(KRW)  Cost(KRW)
0  2021-03-19 14:19:15       apple             1             2300       2300
1  2021-03-20 14:19:15  strawberry             3             2080       6240
2  2021-03-21 14:19:15      banana             5              300       1500
3  2021-03-22 14:19:15      tomato             7              700       4900
4  2021-03-23 14:19:15       melon             9             3000      27000
5  2021-03-18 14:19:15      orange            10             1000      10000

<--- DataFrame df3 indexing --->
***** column indexing *****
'df3' 1st col : 
0    2021-03-19 14:19:15
1    2021-03-20 14:19:15
2    2021-03-21 14:19:15
3    2021-03-22 14:19:15
4    2021-03-23 14:19:15
5    2021-03-18 14:19:15
Name: DateTime, dtype: object

'df3' 2nd col : 
0         apple
1    strawberry
2        banana
3        tomato
4         melon
5        orange
Name: Goods, dtype: object

'df3' 3rd~5th cols : 
   Quantity(EA)  Unit Price(KRW)  Cost(KRW)
0             1             2300       2300
1             3             2080       6240
2             5              300       1500
3             7              700       4900
4             9             3000      27000
5            10             1000      10000


***** row indexing *****
'df3' 1st row : 
              DateTime  Goods  Quantity(EA)  Unit Price(KRW)  Cost(KRW)
0  2021-03-19 14:19:15  apple             1             2300       2300

'df3' 2st row : 
DateTime           2021-03-20 14:19:15
Goods                       strawberry
Quantity(EA)                         3
Unit Price(KRW)                   2080
Cost(KRW)                         6240
Name: 1, dtype: object

'df3' 3rd~5th rows : 
              DateTime   Goods  Quantity(EA)  Unit Price(KRW)  Cost(KRW)
3  2021-03-22 14:19:15  tomato             7              700       4900
4  2021-03-23 14:19:15   melon             9             3000      27000
5  2021-03-18 14:19:15  orange            10             1000      10000


***** values indexing *****
'df3' partial select value : 
    Goods  Unit Price(KRW)
0   apple             2300
2  banana              300
3  tomato              700

'df3' select value over 2000KRW of 'Unit Price(KRW)' : 
              DateTime       Goods  Quantity(EA)  Unit Price(KRW)  Cost(KRW)
0  2021-03-19 14:19:15       apple             1             2300       2300
1  2021-03-20 14:19:15  strawberry             3             2080       6240
4  2021-03-23 14:19:15       melon             9             3000      27000





***Repl Closed***

 

반응형
댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2024/07   »
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31
글 보관함