티스토리 뷰
INTRO
Office계에 Excel이 python에는 pandas가 있다.
python pandas는 통합 구조화된 데이터를 다루기에 최적화되어있으며 선택이 아닌 필수라 할 수 있다.
pandas가 제공하는 data 구조와 indexing, 연산을 다루어 본다.
Contents
- Data Object : Series, DataFrame
- Indexing, Operation
pandas libray가 없으면 terminal에서 pip를 이용하여 먼저 설치한다.
$ pip install pandas
python code : pyTest/pyPandas1.py
# -*- coding: utf-8 -*-
import datetime
import pandas as pd
today = datetime.datetime.today()
yesterday = today - datetime.timedelta(1)
data1 = [1, 3, 5, 7, 9]
data2 = [2, 4, 6, 8, 10]
data3 = [2300,6240,1500,4900,27000]
data4 = ['apple','strawberry','banana','tomato','melon']
data5 = ['DateTime','Goods','Cost(KRW)']
date = pd.date_range(today.strftime('%Y-%m-%d-%a, %H:%M:%S'), periods=5)
s1 = pd.Series(data1)
s2 = pd.Series([data2])
s3 = pd.Series(data3,index=date,name='Cost(KRW)')
df1 = pd.DataFrame([data1,data2])
df2 = pd.DataFrame([date,data4,data3]).T
df2.columns = data5
print("<--- DateTime Print --->")
print("{}\n".format(today.strftime('%Y-%m-%d-%a, %H:%M:%S')))
print("<--- Series 's1 / s2' handle --->")
print("'s1' data-type {}".format(type(s1)))
print("'s1' data \n{}".format(s1))
print("'s1' 1st row value : {}".format(s1[0]))
print("'s1' index info : {}\n".format(s1.index))
print("'s2' dtype become the object : {}".format(type(s2)))
print("'s2' data \n{}".format(s2))
print("'s2' 1st row value : {}".format(s2[0]))
print("'s2' index info : {}".format(s2.index))
print("\n\n")
print("<--- Series s3 handle --->")
print("'s3' data-type {}".format(type(s3)))
print("'s3' data \n{}".format(s3))
print("'s3' 1st row value : {}".format(s3[0]))
print("'s3' index info : {}".format(s3.index))
print("\n\n")
print("<--- DataFrame df handle : 1 --->")
print("'df1' data-type {}".format(type(df1)))
print("'df1' data \n{}".format(df1))
print("'df1' (row,col) = (1,1) value : {}".format(df1[0][0]))
print("'df1' index info : {}".format(df1.index))
print("'df1' columns info : {}".format(df1.columns))
print("\n\n")
print("'df2' data-type {}".format(type(df2)))
print("'df2' data \n{}".format(df2))
print("'df2' (row,col) = (1,2) value : {}".format(df2.values[0][1]))
print("'df2' index info : \n{}".format(df2.index))
print("'df2' columns info : \n{}".format(df2.columns))
print("'df2' 1st row data : \n{}".format(df2.head(1)))
print("'df2' end row data : \n{}\n".format(df2.tail(1)))
# print("'df2' descriptions : {}".format(df2.describe()))
print("'df2' DateTime ascending order(오름차순) : \n{}".format(df2.sort_index(axis=0, ascending=True)))
print("'df2' DateTime descending order(내림차순) : \n{}".format(df2.sort_index(axis=0, ascending=False)))
print("\n\n")
print("<--- DataFrame df handle : 2 --->")
df2.insert(2,'Quantity(EA)',s1,False)
df2.insert(3,'Unit Price(KRW)',df2['Cost(KRW)']/df2['Quantity(EA)'],False)
df2 = df2.astype({'Unit Price(KRW)':'int','Cost(KRW)':'int'})
print("'df2' insert new_col operation data : \n{}\n".format(df2))
new_row = [yesterday.strftime('%Y-%m-%d %H:%M:%S'),'orange',10,1000,10000]
df3 = df2.append(pd.Series(new_row,index=df2.columns), ignore_index=True)
print("'df3' append new_row data : \n{}\n".format(df3))
print("<--- DataFrame df3 indexing --->")
print("***** column indexing *****")
print("'df3' 1st col : \n{}\n".format(df3.DateTime))
print("'df3' 2nd col : \n{}\n".format(df3['Goods']))
print("'df3' 3rd~5th cols : \n{}\n".format(df3.iloc[:,2:]))
print("\n***** row indexing *****")
print("'df3' 1st row : \n{}\n".format(df3[0:1]))
print("'df3' 2st row : \n{}\n".format(df3.iloc[1]))
print("'df3' 3rd~5th rows : \n{}\n".format(df3.iloc[3:,:]))
print("\n***** values indexing *****")
print("'df3' partial select value : \n{}\n".format(df3.iloc[[0,2,3],[1,3]]))
print("'df3' select value over 2000KRW of 'Unit Price(KRW)' : \n{}\n".format(df3[df3['Unit Price(KRW)'] > 2000]))
print("\n\n")
---- SublimeText python Run pyPandas1.py ----
<--- DateTime Print --->
2021-03-19-Fri, 14:19:15
<--- Series 's1 / s2' handle --->
's1' data-type <class 'pandas.core.series.Series'>
's1' data
0 1
1 3
2 5
3 7
4 9
dtype: int64
's1' 1st row value : 1
's1' index info : RangeIndex(start=0, stop=5, step=1)
's2' dtype become the object : <class 'pandas.core.series.Series'>
's2' data
0 [2, 4, 6, 8, 10]
dtype: object
's2' 1st row value : [2, 4, 6, 8, 10]
's2' index info : RangeIndex(start=0, stop=1, step=1)
<--- Series s3 handle --->
's3' data-type <class 'pandas.core.series.Series'>
's3' data
2021-03-19 14:19:15 2300
2021-03-20 14:19:15 6240
2021-03-21 14:19:15 1500
2021-03-22 14:19:15 4900
2021-03-23 14:19:15 27000
Freq: D, Name: Cost(KRW), dtype: int64
's3' 1st row value : 2300
's3' index info : DatetimeIndex(['2021-03-19 14:19:15', '2021-03-20 14:19:15',
'2021-03-21 14:19:15', '2021-03-22 14:19:15',
'2021-03-23 14:19:15'],
dtype='datetime64[ns]', freq='D')
<--- DataFrame df handle : 1 --->
'df1' data-type <class 'pandas.core.frame.DataFrame'>
'df1' data
0 1 2 3 4
0 1 3 5 7 9
1 2 4 6 8 10
'df1' (row,col) = (1,1) value : 1
'df1' index info : RangeIndex(start=0, stop=2, step=1)
'df1' columns info : RangeIndex(start=0, stop=5, step=1)
'df2' data-type <class 'pandas.core.frame.DataFrame'>
'df2' data
DateTime Goods Cost(KRW)
0 2021-03-19 14:19:15 apple 2300
1 2021-03-20 14:19:15 strawberry 6240
2 2021-03-21 14:19:15 banana 1500
3 2021-03-22 14:19:15 tomato 4900
4 2021-03-23 14:19:15 melon 27000
'df2' (row,col) = (1,2) value : apple
'df2' index info :
RangeIndex(start=0, stop=5, step=1)
'df2' columns info :
Index(['DateTime', 'Goods', 'Cost(KRW)'], dtype='object')
'df2' 1st row data :
DateTime Goods Cost(KRW)
0 2021-03-19 14:19:15 apple 2300
'df2' end row data :
DateTime Goods Cost(KRW)
4 2021-03-23 14:19:15 melon 27000
'df2' DateTime ascending order(오름차순) :
DateTime Goods Cost(KRW)
0 2021-03-19 14:19:15 apple 2300
1 2021-03-20 14:19:15 strawberry 6240
2 2021-03-21 14:19:15 banana 1500
3 2021-03-22 14:19:15 tomato 4900
4 2021-03-23 14:19:15 melon 27000
'df2' DateTime descending order(내림차순) :
DateTime Goods Cost(KRW)
4 2021-03-23 14:19:15 melon 27000
3 2021-03-22 14:19:15 tomato 4900
2 2021-03-21 14:19:15 banana 1500
1 2021-03-20 14:19:15 strawberry 6240
0 2021-03-19 14:19:15 apple 2300
<--- DataFrame df handle : 2 --->
'df2' insert new_col operation data :
DateTime Goods Quantity(EA) Unit Price(KRW) Cost(KRW)
0 2021-03-19 14:19:15 apple 1 2300 2300
1 2021-03-20 14:19:15 strawberry 3 2080 6240
2 2021-03-21 14:19:15 banana 5 300 1500
3 2021-03-22 14:19:15 tomato 7 700 4900
4 2021-03-23 14:19:15 melon 9 3000 27000
'df3' append new_row data :
DateTime Goods Quantity(EA) Unit Price(KRW) Cost(KRW)
0 2021-03-19 14:19:15 apple 1 2300 2300
1 2021-03-20 14:19:15 strawberry 3 2080 6240
2 2021-03-21 14:19:15 banana 5 300 1500
3 2021-03-22 14:19:15 tomato 7 700 4900
4 2021-03-23 14:19:15 melon 9 3000 27000
5 2021-03-18 14:19:15 orange 10 1000 10000
<--- DataFrame df3 indexing --->
***** column indexing *****
'df3' 1st col :
0 2021-03-19 14:19:15
1 2021-03-20 14:19:15
2 2021-03-21 14:19:15
3 2021-03-22 14:19:15
4 2021-03-23 14:19:15
5 2021-03-18 14:19:15
Name: DateTime, dtype: object
'df3' 2nd col :
0 apple
1 strawberry
2 banana
3 tomato
4 melon
5 orange
Name: Goods, dtype: object
'df3' 3rd~5th cols :
Quantity(EA) Unit Price(KRW) Cost(KRW)
0 1 2300 2300
1 3 2080 6240
2 5 300 1500
3 7 700 4900
4 9 3000 27000
5 10 1000 10000
***** row indexing *****
'df3' 1st row :
DateTime Goods Quantity(EA) Unit Price(KRW) Cost(KRW)
0 2021-03-19 14:19:15 apple 1 2300 2300
'df3' 2st row :
DateTime 2021-03-20 14:19:15
Goods strawberry
Quantity(EA) 3
Unit Price(KRW) 2080
Cost(KRW) 6240
Name: 1, dtype: object
'df3' 3rd~5th rows :
DateTime Goods Quantity(EA) Unit Price(KRW) Cost(KRW)
3 2021-03-22 14:19:15 tomato 7 700 4900
4 2021-03-23 14:19:15 melon 9 3000 27000
5 2021-03-18 14:19:15 orange 10 1000 10000
***** values indexing *****
'df3' partial select value :
Goods Unit Price(KRW)
0 apple 2300
2 banana 300
3 tomato 700
'df3' select value over 2000KRW of 'Unit Price(KRW)' :
DateTime Goods Quantity(EA) Unit Price(KRW) Cost(KRW)
0 2021-03-19 14:19:15 apple 1 2300 2300
1 2021-03-20 14:19:15 strawberry 3 2080 6240
4 2021-03-23 14:19:15 melon 9 3000 27000
***Repl Closed***
반응형
'python > Lecture' 카테고리의 다른 글
python 활용 #9 : django, csv 파일을 이용한 model db 생성 (0) | 2021.12.25 |
---|---|
python 활용 #8 : django (0) | 2021.03.03 |
python 기초 #10 : JSON Parsing - Open API (0) | 2021.03.03 |
python 기초 #9 : Data Visualization(데이터 시각화) (0) | 2021.03.03 |
python 기초 #8 : Database - SQLite (0) | 2021.03.03 |
댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
- Total
- Today
- Yesterday
링크
TAG
- raspberrypi
- vscode
- arduino
- SSH
- Regression
- Model
- Django
- ERP
- sublime text
- 자가격리
- analysis
- pyserial
- Python
- git
- Raspberry Pi
- 라즈베리파이
- 코로나
- template
- 코로나19
- COVID-19
- r
- CSV
- Templates
- Pandas
- 확진
- server
- MacOS
- DS18B20
- github
- DAQ
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 |
글 보관함