Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

LiJell's 성장기

__06.data_analyzing_tourist_data (한국 관광객 추이 1편) 본문

Bigdata/Web Crawling

__06.data_analyzing_tourist_data (한국 관광객 추이 1편)

All_is_LiJell 2022. 1. 14. 18:13

한국 관광객 추이 2편 시각화: https://lime-jelly.tistory.com/46

__07.data_analyzing_visualizing_tourist_data

이전 과정은 아래를 참고 data 변환: https://lime-jelly.tistory.com/43 7. tourist data visualizing (관광객 데이터 시각화) 7.1. 시계열 그래프 그리기 import pandas as pd import numpy as np from pandas i..

lime-jelly.tistory.com

6. 한국관광 데이터랩을 이용한 관광객 추이

방한 외래관광객 (국적별) -> 목적별/국적별 -> 주기(월) -> 기간 정하기
엑셀 데이터로 다운로드

목적

한달 기준 데이터를 이용하여 함수를 만들것이다
만들어진 함수를 이용하여 데이터를 분류 저장하고
분류된 데이터를 시각화하여 추이를 살펴본다

6.1. 필요한 데이터 불러오기

import pandas as pd
import numpy as np
from pandas import Series, DataFrame

불필요한 자료는 제외하고 필요한 자료만 불러온다

kto_201901 = pd.read_excel('./files/kto_201901.xlsx', # 다운로드한 엑셀 경로
                          header = 1,
                          usecols = 'A:G',
                          skipfooter = 4)
kto_201901

정보 확인

kto_201901.info()
'''
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 67 entries, 0 to 66
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   국적      67 non-null     object
 1   관광      67 non-null     int64 
 2   상용      67 non-null     int64 
 3   공용      67 non-null     int64 
 4   유학/연수   67 non-null     int64 
 5   기타      67 non-null     int64 
 6   계       67 non-null     int64 
dtypes: int64(6), object(1)
memory usage: 3.8+ KB
'''

kto_201901.describe() # DataFrame format임
'''

관광    상용    공용    유학/연수    기타    계
count    67.00000    67.000000    67.000000    67.000000    67.000000    67.000000
mean    26396.80597    408.208955    132.507463    477.462687    5564.208955    32979.194030
std    102954.04969    1416.040302    474.406339    2009.484800    17209.438418    122821.369969
min    0.00000    0.000000    0.000000    0.000000    16.000000    54.000000
25%    505.00000    14.500000    2.500000    17.500000    260.000000    927.000000
50%    1304.00000    45.000000    14.000000    43.000000    912.000000    2695.000000
75%    8365.00000    176.500000    38.000000    182.000000    2824.500000    14905.500000
max    765082.00000    10837.000000    2657.000000    14087.000000    125521.000000    916950.000000
'''

# 0이 있는 값들을 확인
condition = (kto_201901['관광'] == 0)|\
            (kto_201901['상용'] == 0)|\
            (kto_201901['공용'] == 0)|\
            (kto_201901['유학/연수'] == 0)

kto_201901[condition]
'''
    국적    관광    상용    공용    유학/연수    기타    계
4    마카오    2506    2    0    17    45    2570
20    이스라엘    727    12    0    9    57    805
22    우즈베키스탄    1958    561    0    407    2828    5754
38    스위스    613    18    0    19    97    747
45    그리스    481    17    4    0    273    775
46    포르투갈    416    14    0    13    121    564
51    크로아티아    226    12    0    3    250    491
54    폴란드    713    10    0    27    574    1324
59    대양주 기타    555    3    4    0    52    614
63    기타대륙    33    4    0    1    16    54
64    국적미상    33    4    0    1    16    54
65    교포소계    0    0    0    0    15526    15526
66    교포    0    0    0    0    15526    15526
'''

6.2. 데이터 변환

6.2.1. 기준년월 추가하기

kto_201901['기준년월'] = '2019-01' # 새 컬럼 추가
kto_201901
'''
    국적    관광    상용    공용    유학/연수    기타    계    기준년월
0    아시아주    765082    10837    1423    14087    125521    916950    2019-01
1    일본    198805    2233    127    785    4576    206526    2019-01
2    대만    86393    74    22    180    1285    87954    2019-01
3    홍콩    34653    59    2    90    1092    35896    2019-01
4    마카오    2506    2    0    17    45    2570    2019-01
...    ...    ...    ...    ...    ...    ...    ...    ...
62    아프리카 기타    768    718    90    206    908    2690    2019-01
63    기타대륙    33    4    0    1    16    54    2019-01
64    국적미상    33    4    0    1    16    54    2019-01
65    교포소계    0    0    0    0    15526    15526    2019-01
66    교포    0    0    0    0    15526    15526    2019-01
'''

6.2.2. unique(), nunique(), value_count()

kto_201901['국적'].unique() # 중복값을 제거해 보여주는 함수
kto_201901['국적'].nunique() # unique 개수
#67
kto_201901['국적'].value_counts() # 나열해서 각 나라별 개수 보여줌

6.3. 대륙 제거

국적에 대륙별 기록이 있기 때문에 대륙을 제거하자

6.3.1. 대륙제거

continents_list = ['아시아주','미주','구주','대양주','아프리카주','기타대륙','교포소계']

condition = kto_201901['국적'].isin(continents_list)
# 두가지 방법
kto_201901[~condition]
kto_201901_country = kto_201901[condition == False]

# 확인해보기
kto_201901_country.head()

'''
    국적    관광    상용    공용    유학/연수    기타    계    기준년월
1    일본    198805    2233    127    785    4576    206526    2019-01
2    대만    86393    74    22    180    1285    87954    2019-01
3    홍콩    34653    59    2    90    1092    35896    2019-01
4    마카오    2506    2    0    17    45    2570    2019-01
5    태국    34004    37    199    96    6998    41334    2019-01
'''

6.3.2. index reset

# index 값이 이 전 값을 그대로 따라와서 reset
kto_201901_country_newindex = kto_201901_country.reset_index(drop = True)
kto_201901_country_newindex.head()
'''
국적    관광    상용    공용    유학/연수    기타    계    기준년월
0    일본    198805    2233    127    785    4576    206526    2019-01
1    대만    86393    74    22    180    1285    87954    2019-01
2    홍콩    34653    59    2    90    1092    35896    2019-01
3    마카오    2506    2    0    17    45    2570    2019-01
4    태국    34004    37    199    96    6998    41334    2019-01
'''

6.4 국적별 대륙을 추가

리스트 만들기

# list type 이라
continents = ['아시아']*25 + ['아메리카']*5 + ['유럽']*23 + ['오세아니아']*3 \
+ ['아프리카']*2 + ['기타대륙'] + ['교포']

print(continents)
'''
['아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아시아', '아메리카', '아메리카', '아메리카', '아메리카', '아메리카', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '유럽', '오세아니아', '오세아니아', '오세아니아', '아프리카', '아프리카', '기타대륙', '교포']
'''

대륙 column 추가

kto_201901_country_newindex['대륙'] = continents
kto_201901_country_newindex.head()

'''
    국적    관광    상용    공용    유학/연수    기타    계    기준년월    대륙
0    일본    198805    2233    127    785    4576    206526    2019-01    아시아
1    대만    86393    74    22    180    1285    87954    2019-01    아시아
2    홍콩    34653    59    2    90    1092    35896    2019-01    아시아
3    마카오    2506    2    0    17    45    2570    2019-01    아시아
4    태국    34004    37    199    96    6998    41334    2019-01    아시아
'''

6.5. 다른 컬럼 추가

관광객 비율(%) 추가

kto_201901_country_newindex['관광객비율(%)'] = \
round(kto_201901_country_newindex['관광']*100 / kto_201901_country_newindex['계'], 2)
kto_201901_country_newindex.head()
'''
국적    관광    상용    공용    유학/연수    기타    계    기준년월    대륙    관광객비율(%)
0    일본    198805    2233    127    785    4576    206526    2019-01    아시아    96.26
1    대만    86393    74    22    180    1285    87954    2019-01    아시아    98.23
2    홍콩    34653    59    2    90    1092    35896    2019-01    아시아    96.54
3    마카오    2506    2    0    17    45    2570    2019-01    아시아    97.51
4    태국    34004    37    199    96    6998    41334    2019-01    아시아    82.27
'''

관광객 비율로 sort

kto_201901_country_newindex = kto_201901_country_newindex.sort_values(by = '관광객비율(%)', ascending = False)
kto_201901_country_newindex = kto_201901_country_newindex.reset_index(drop = True)

6.6. 대륙별 관광객 비율 pivot table

kto_201901_country_newindex.pivot_table(
                    values = "관광객비율(%)",
                    index = '대륙',
                    )
'''
    관광객비율(%)
대륙    
교포    0.000000
기타대륙    61.110000
아메리카    68.196000
아시아    59.618800
아프리카    32.675000
오세아니아    84.806667
유럽    63.823043
'''

6.7 전체 비율 추가하기

condition = kto_201901_country_newindex['국적'] == '중국'
kto_201901_country_newindex[condition]
'''
    국적    관광    상용    공용    유학/연수    기타    계    기준년월    대륙    관광객비율(%)
14    중국    320113    2993    138    8793    60777    392814    2019-01    아시아    81.49
'''

6.7.1 전체 관광객 수 구하기

# 두가지 방법이 있지만 type이 달라진다
tourist_sum = kto_201901_country_newindex['관광'].sum()
print(type(tourist_sum))
tourist_sum = sum(kto_201901_country_newindex['관광'])
print(type(tourist_sum))

'''
<class 'numpy.int64'>
<class 'int'>
'''

6.7.2. 전체비율 추가

kto_201901_country_newindex['전체비율(%)'] = round(\
kto_201901_country_newindex['관광']*100/tourist_sum, 1)
kto_201901_country_newindex
'''
생략
'''

6.8 모든 년도 합치기

6.8.1. 함수화

def create_kto_data(yy, mm):
    file_path = './files/kto_{}{}.xlsx'.format(yy, mm)
    df = pd.read_excel(file_path,
                      header = 1,
                      usecols = 'A:G',
                      skipfooter = 4)
    # 기준년월 추가
    df['기준년월'] = '{}-{}'.format(yy,mm)

    # 국적 칼럼에서 대륙 없애고 국가만 남기기
    continents_list = ['아시아주','미주','구주','대양주','아프리카주','기타대륙','교포소계']
    condition = df['국적'].isin(continents_list)
    df_country = df[~condition].reset_index(drop = True)

    # 대륙 컬럼 따로 추가
    continents_add = ['아시아']*25 + ['아메리카']*5 + ['유럽']*23 + ['오세아니아']*3 \
+ ['아프리카']*2 + ['기타대륙'] + ['교포']
    df_country['대륙'] = continents_add

    # 관광객 비율
    df_country['관광객비율(%)'] = round(df_country['관광']*100 / df_country['계'], 2)

    #전체 비율
    df_country['전체비율(%)'] = round(df_country['관광'])*100 / sum(df_country['관광'], 2)
    df_country = df_country.sort_values(by = '전체비율(%)', ascending = False).reset_index(drop=True)

    return(df_country)

6.8.2 오류

한자리 수의 월은 0이 포함되어 계산하지 않음으로 함수를 적용할 때 보완이 필요하다
예) 01, 02 ,03이 아닌 1, 2, 3 으로 인식
10의 자리를 채워줘야한다

kto_test = creat_kto_data(2010, 01)
kto_test
'''
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers

'''

6.9. for으로 함수 돌리기

6.9.1 for 문

df = pd.DataFrame()
# 본인은 2020년까지 데이터를 뽑았다.
for yy in range(2010, 2021):
    for mm in range(1,13):
        mm_str = str(mm).zfill(2) # 한자리 수의 10의 자리를 0으로 채워준다 

        try: # 해보고
            temp = create_kto_data(str(yy), mm_str)
            df = df.append(temp, ignore_index = True)

        except: # 안돼면 pass
            pass

6.9.10 저장

df.to_excel('./files/kto_total.xlsx', index = False)

6.10. 국가별로 분류하여 저장하기

6.10.1. 저장 파일 불러오기

df1 = pd.read_excel('./files/kto_total.xlsx')

6.10.2. 국가별 저장

country_list = df1['국적'].unique() # 국가 리스트 추출
# print(country_list)

for cntry in country_list:

    condition = df['국적'] == cntry
    df_filter = df[condition]

    file_path = './files/[국적별 관광객 데이터] {}.xlsx'.format(cntry)
    df_filter.to_excel(file_path, index = False)

- 다음 과정: https://lime-jelly.tistory.com/46

'Bigdata > Web Crawling' 카테고리의 다른 글

__08.data_analyzing_crawling_instagram (인스타그램 1편) (0)	2022.01.17
__07.data_analyzing_visualizing_tourist_data (한국 관광객 추이 2편) (0)	2022.01.17
__05.data.analyzing_youtube_data_visualizing (유튜브 2편) (0)	2022.01.14
__04.data_analyzing_crawling_youtube_ranking (유튜브 1편) (0)	2022.01.13
__03.data_analyzing_crawling_melon_bugs_genie (멜론, 벅스, 지니) (3)	2022.01.13

'Bigdata/Web Crawling' Related Articles

Comments