手机游戏产业价值数十亿美元,公司在这些游戏的开发和市场推广上花费了大量的资金。利用苹果商店游戏数据集,可以洞察其中的细分市场——策略游戏。这一细分市场包括《部落冲突》、《植物大战僵尸》和《精灵宝可梦GO》等游戏。在案例中可以将游戏的评分数量作为游戏总体获得成功的代替指标,找出是什么因素造就了这款游戏的成功;或者可以衡量一段时间内市场的状况,并尝试预测这些游戏的发展方向。

目录

  1. 数据读取与预处理
  2. 不同游戏类型的分析
  3. 不同游戏属性的分析
  4. 游戏top榜

本数据集包括Apple App Store上17007个游戏APP的数据。它是使用iTunes API和App Store站点地图于2019年8月3日收集的。

  • 文件:appstore_games.csv
  • 字段:
字段名 Description 描述
URL The URL 网页地址
ID The assigned ID 指定ID
Name The name 游戏名称
Subtitle The secondary text under the name 游戏名称下的辅助文字
Icon URL 512px × 512px jpg 图标URL,格式512×512px,jpg
Average User Rating Rounded to nearest .5, requires at least 5 ratings 用户平均评分,四舍五入至最接近的0.5倍数,至少需要5个评分
User Rating Count Number of ratings internationally, null means it is below 5 用户评分次数,null表示评分次数少于5次
Price Price in USD 游戏价格(美元)
In-app Purchases Prices of available in-app purchases 应用内购买的价格
Description App description APP 描述
Developer App developer APP开发者
Age Rating Either 4+, 9+, 12+ or 17+ 年龄分级,4 +,9 +,12 +或17+
Languages ISO2A language codes APP语言,ISO2A语言代码
Size Size of the app in bytes APP大小(以字节为单位)
Primary Genre Main genre 主要类型(如:游戏/娱乐)
Genres Genres of the app 应用类型(如:游戏/解谜)
Original Release Date When it was released 最初发布日期
Current Version Release Date When it was last updated 当前版本发布日期

1.数据读取与预处理

import numpy as np 
import pandas as pd 

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.offline as py
import plotly.graph_objs as go
import plotly.express as px
py.init_notebook_mode(connected=True)
%matplotlib inline
data = pd.read_csv('./input/appstore_games.csv', parse_dates=['Original Release Date', 'Current Version Release Date'])
data.head()
URL ID Name Subtitle Icon URL Average User Rating User Rating Count Price In-app Purchases Description Developer Age Rating Languages Size Primary Genre Genres Original Release Date Current Version Release Date
0 https://apps.apple.com/us/app/sudoku/id284921427 284921427 Sudoku NaN https://is2-ssl.mzstatic.com/image/thumb/Purpl... 4.0 3553.0 2.99 NaN Join over 21,000,000 of our fans and download ... Mighty Mighty Good Games 4+ DA, NL, EN, FI, FR, DE, IT, JA, KO, NB, PL, PT... 15853568.0 Games Games, Strategy, Puzzle 2008-11-07 2017-05-30
1 https://apps.apple.com/us/app/reversi/id284926400 284926400 Reversi NaN https://is4-ssl.mzstatic.com/image/thumb/Purpl... 3.5 284.0 1.99 NaN The classic game of Reversi, also known as Oth... Kiss The Machine 4+ EN 12328960.0 Games Games, Strategy, Board 2008-11-07 2018-05-17
2 https://apps.apple.com/us/app/morocco/id284946595 284946595 Morocco NaN https://is5-ssl.mzstatic.com/image/thumb/Purpl... 3.0 8376.0 0.00 NaN Play the classic strategy game Othello (also k... Bayou Games 4+ EN 674816.0 Games Games, Board, Strategy 2008-11-07 2017-05-09
3 https://apps.apple.com/us/app/sudoku-free/id28... 285755462 Sudoku (Free) NaN https://is3-ssl.mzstatic.com/image/thumb/Purpl... 3.5 190394.0 0.00 NaN Top 100 free app for over a year.\nRated "Best... Mighty Mighty Good Games 4+ DA, NL, EN, FI, FR, DE, IT, JA, KO, NB, PL, PT... 21552128.0 Games Games, Strategy, Puzzle 2008-07-23 2017-05-30
4 https://apps.apple.com/us/app/senet-deluxe/id2... 285831220 Senet Deluxe NaN https://is1-ssl.mzstatic.com/image/thumb/Purpl... 3.5 28.0 2.99 NaN "Senet Deluxe - The Ancient Game of Life and A... RoGame Software 4+ DA, NL, EN, FR, DE, EL, IT, JA, KO, NO, PT, RU... 34689024.0 Games Games, Strategy, Board, Education 2008-07-18 2018-07-22

首先删除掉无用的字段:游戏链接(URL列)和指定ID(ID列)。

data.drop(['URL', 'ID'], axis = 1, inplace = True)

数据中的Icon URL列包含游戏APP的图标,我们需要根据图片的url链接,通过爬虫采集的方式进行获取,在这里选取其中的前20个进行展示。

import requests
from PIL import Image
from io import BytesIO

fig, ax = plt.subplots(4,5, figsize=(8,8))

for i in range(20):
    r = requests.get(data['Icon URL'][i])
    im = Image.open(BytesIO(r.content))
    ax[i//5][i%5].imshow(im)
    ax[i//5][i%5].axis('off')
    
plt.show()

png

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17007 entries, 0 to 17006
Data columns (total 16 columns):
Name                            17007 non-null object
Subtitle                        5261 non-null object
Icon URL                        17007 non-null object
Average User Rating             7561 non-null float64
User Rating Count               7561 non-null float64
Price                           16983 non-null float64
In-app Purchases                7683 non-null object
Description                     17007 non-null object
Developer                       17007 non-null object
Age Rating                      17007 non-null object
Languages                       16947 non-null object
Size                            17006 non-null float64
Primary Genre                   17007 non-null object
Genres                          17007 non-null object
Original Release Date           17007 non-null datetime64[ns]
Current Version Release Date    17007 non-null datetime64[ns]
dtypes: datetime64[ns](2), float64(4), object(10)
memory usage: 2.1+ MB

可以看到在本数据集中,游戏名称下的辅助文字Subtitle、用户平均评分Average User Rating、用户评分次数User Rating Count和应用内购买的价格In-app Purchases等字段的空值较多。

data.describe(include='all')
Name Subtitle Icon URL Average User Rating User Rating Count Price In-app Purchases Description Developer Age Rating Languages Size Primary Genre Genres Original Release Date Current Version Release Date
count 17007 5261 17007 7561.000000 7.561000e+03 16983.000000 7683 17007 17007 17007 16947 1.700600e+04 17007 17007 17007 17007
unique 16847 5010 16847 NaN NaN NaN 3803 16473 8693 4 990 NaN 21 1004 3084 2512
top Checkers (Draughts) Emoji Stickers https://is1-ssl.mzstatic.com/image/thumb/Purpl... NaN NaN NaN 0.99 #NAME? Tapps Tecnologia da Informa\xe7\xe3o Ltda. 4+ EN NaN Games Games, Strategy, Puzzle 2016-02-09 00:00:00 2019-01-08 00:00:00
freq 2 14 2 NaN NaN NaN 943 17 123 11806 12467 NaN 16286 778 75 118
first NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2008-01-08 00:00:00 2008-01-08 00:00:00
last NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2019-12-07 00:00:00 2019-12-07 00:00:00
mean NaN NaN NaN 4.060905 3.306531e+03 0.813419 NaN NaN NaN NaN NaN 1.157064e+08 NaN NaN NaN NaN
std NaN NaN NaN 0.751428 4.232256e+04 7.835732 NaN NaN NaN NaN NaN 2.036477e+08 NaN NaN NaN NaN
min NaN NaN NaN 1.000000 5.000000e+00 0.000000 NaN NaN NaN NaN NaN 5.132800e+04 NaN NaN NaN NaN
25% NaN NaN NaN 3.500000 1.200000e+01 0.000000 NaN NaN NaN NaN NaN 2.295014e+07 NaN NaN NaN NaN
50% NaN NaN NaN 4.500000 4.600000e+01 0.000000 NaN NaN NaN NaN NaN 5.676895e+07 NaN NaN NaN NaN
75% NaN NaN NaN 4.500000 3.090000e+02 0.000000 NaN NaN NaN NaN NaN 1.330271e+08 NaN NaN NaN NaN
max NaN NaN NaN 5.000000 3.032734e+06 179.990000 NaN NaN NaN NaN NaN 4.005591e+09 NaN NaN NaN NaN

观察上表,除了可以看到游戏价格Price、用户平均评分Average User Rating、用户评分次数User Rating Count等列的最大、最小、平均等统计值外,我们发现数据集中APP的主要类型Primary Genre存在21中取值,我们需要单独查看一下。

fig = plt.figure(figsize = (12,6))

# 根据应用主要类型进行分组,绘制柱状图
data['Primary Genre'].value_counts().plot(kind = 'bar',color = 'deepskyblue')

plt.show()

png

可以看到由于在数据收集中的不准确,APP数据中也混入了少量其他类型的APP,如教育、娱乐、公共事业、运动等等,我们需要将不属于游戏类型的APP进行剔除。同时为了便于后续的分析,我们需要剔除掉一些非常小众的应用,即提取具有至少200条评论的游戏APP进行分析。

data = data.loc[(data['User Rating Count'] > 200) & (data['Primary Genre'] == 'Games')]

2.不同游戏类型的分析

top_10 = data['Genres'].value_counts().sort_values(ascending=False).head(10)

trace = go.Pie(labels = top_10.index, 
               values = top_10.values, 
               title = 'Genres', 
               hoverinfo = 'percent+value', 
               textinfo = 'percent',
               textposition = 'inside',
               hole = 0.7,
               showlegend = True,
               marker = dict(colors = ['cyan'],line = dict(color = '#000000',width = 2)))

py.iplot([trace])

我们可以发现,App Store中的游戏主要为策略,模拟和动作等组合类型。

# 转换大小(MB)
data['Size'] = round(data['Size']/1024/1024,1)

plt.figure(figsize = (12,6))

ax = sns.kdeplot(data['Size'], shade = True, linewidth = 5, color = 'teal')
ax.set_xlabel('Size')
plt.show()

png

fig = px.scatter(data,x = "Average User Rating",y = "User Rating Count",size = "Size",color = "Genres",
                 log_y = True,size_max = 80)
fig.show()

在这里我们可以观察并发现哪些类型的游戏具有最高的用户评分,更多的评论以及游戏大小。

3.不同游戏属性的分析

3.1 游戏随时间的演变趋势

data['Release Year'] = data['Original Release Date'].dt.year

fig, ax = plt.subplots(1, 2, figsize=(15, 8))

sns.lineplot(x='Release Year', y='Price', data=data, palette='Wistia', ax=ax[0])
ax[0].set_title('Release Year vs Price')

sns.lineplot(x='Release Year', y='Size', data=data, palette='Wistia', ax=ax[1])
ax[1].set_title('Relase Year vs Size')

plt.tight_layout()
plt.show()

png

我们可以观察到游戏价格大幅下降,但游戏大小却增加了,这很明显因为我们大多数人都可以访问互联网,并且我们可以轻松下载1-2GB的游戏。

3.2 免费与付费游戏的比较

paid = data[data['Price']>0]
free = data[data['Price']==0]

fig, ax = plt.subplots(1, 2, figsize=(15,8))

sns.countplot(data=paid, y='Average User Rating', ax=ax[0], palette='plasma')
ax[0].set_title('Paid Games')
ax[0].set_xlim([0, 1000])

sns.countplot(data=free, y='Average User Rating', ax=ax[1], palette='viridis')
ax[1].set_title('Free Games')
ax[1].set_xlim([0,1000])
plt.tight_layout();
plt.show()

png

  • 1.不出所料,付费游戏数量少于免费游戏
  • 2.但是我们仍然看不到这两个类别的用户评分有任何区别
  • 3.大多数游戏在4.0-5.0左右的评分都相当不错
  • 4.价格似乎并没有影响评分,因为免费游戏和付费游戏的评分几乎相同

3.3 游戏适宜年龄分布

age = data['Age Rating'].value_counts()

trace = go.Pie(labels = age.index, 
               values = age.values, 
               title = 'Age Rating', 
               hoverinfo = 'percent+value', 
               textinfo = 'percent',
               textposition = 'inside',
               hole = 0.7,
               showlegend = True,
               marker = dict(colors = ['cyan', 'gold', 'red'],line = dict(color = '#000000',width = 2)))

py.iplot([trace])

  • 1.大多数游戏为4+和9+
  • 2.因此,明确的游戏开发人员正在寻找广泛的受众

4.游戏top榜

4.1 最昂贵的游戏

plt.figure(figsize=(12,6))

ax = sns.kdeplot(data['Price'], shade = True, linewidth = 5, color = 'm')
ax.set_ylabel('Count')
ax.set_xlabel('Price')

plt.show()

png

我们可以观察到,大多数应用程序都是免费的,其中一些成本在175到200美元之间。

price = data.sort_values(by='Price', ascending=False)[['Name', 'Price', 'Average User Rating', 'Size', 'Icon URL']].head(10)
price
Name Price Average User Rating Size Icon URL
276 SmartGo Kifu 19.99 4.5 61.2 https://is4-ssl.mzstatic.com/image/thumb/Purpl...
2863 Panzer Corps 19.99 4.5 1456.2 https://is4-ssl.mzstatic.com/image/thumb/Purpl...
12362 Tropico 11.99 4.5 2429.5 https://is2-ssl.mzstatic.com/image/thumb/Purpl...
8111 SteamWorld Heist 9.99 4.5 231.4 https://is5-ssl.mzstatic.com/image/thumb/Purpl...
4868 Six Ages: Ride Like the Wind 9.99 5.0 730.8 https://is4-ssl.mzstatic.com/image/thumb/Purpl...
1129 Avernum: Escape From the Pit HD 9.99 5.0 166.4 https://is4-ssl.mzstatic.com/image/thumb/Purpl...
2147 "Baldur's Gate II: EE" 9.99 4.0 3574.1 https://is1-ssl.mzstatic.com/image/thumb/Purpl...
4473 Banner Saga 9.99 4.5 1936.5 https://is2-ssl.mzstatic.com/image/thumb/Purpl...
3415 FTL: Faster Than Light 9.99 4.5 171.5 https://is4-ssl.mzstatic.com/image/thumb/Purpl...
4425 Icewind Dale 9.99 4.0 2857.2 https://is3-ssl.mzstatic.com/image/thumb/Purpl...

SmartGo Kifu和Panzer Corps是App Store上最昂贵的游戏。

from PIL import Image

plt.figure(figsize=(6,3))
plt.subplot(121)
image = Image.open(BytesIO(requests.get(price.iloc[0,-1]).content))
plt.imshow(image)
plt.axis('off')

plt.subplot(122)
image = Image.open(BytesIO(requests.get(price.iloc[1,-1]).content))
plt.imshow(image)
plt.axis('off')

plt.show()

png

4.2 最受欢迎的游戏

fig,[ax1,ax2] = plt.subplots(1,2,figsize=(12,6))

sns.countplot(data = data, x ='Average User Rating', palette = 'gray', alpha = 0.7, linewidth=4, edgecolor= 'black',ax = ax1)
ax1.set_ylabel('Count')
ax1.set_xlabel('Average User Rating')

sns.kdeplot(data['User Rating Count'], shade = True, linewidth = 5, color = 'k',ax = ax2)
ax2.set_xlabel('User Rating Count')

plt.tight_layout()
plt.show()

png

review = data.sort_values(by='User Rating Count', ascending=False)[['Name', 'Price', 'Average User Rating', 'Size', 'User Rating Count', 'Icon URL']].head(10)
review
Name Price Average User Rating Size User Rating Count Icon URL
1378 Clash of Clans 0.0 4.5 153.8 3032734.0 https://is2-ssl.mzstatic.com/image/thumb/Purpl...
7187 Clash Royale 0.0 4.5 138.4 1277095.0 https://is3-ssl.mzstatic.com/image/thumb/Purpl...
13414 PUBG MOBILE 0.0 4.5 2273.6 711409.0 https://is3-ssl.mzstatic.com/image/thumb/Purpl...
1921 Plants vs. Zombies\u2122 2 0.0 4.5 115.2 469562.0 https://is2-ssl.mzstatic.com/image/thumb/Purpl...
8139 Pok\xe9mon GO 0.0 3.5 268.5 439776.0 https://is3-ssl.mzstatic.com/image/thumb/Purpl...
2410 Boom Beach 0.0 4.5 193.4 400787.0 https://is1-ssl.mzstatic.com/image/thumb/Purpl...
12473 Cash, Inc. Fame & Fortune Game 0.0 5.0 234.6 374772.0 https://is3-ssl.mzstatic.com/image/thumb/Purpl...
8632 Idle Miner Tycoon: Cash Empire 0.0 4.5 423.4 283035.0 https://is5-ssl.mzstatic.com/image/thumb/Purpl...
38 TapDefense 0.0 3.5 7.4 273687.0 https://is2-ssl.mzstatic.com/image/thumb/Purpl...
3550 Star Wars\u2122: Commander 0.0 4.5 117.4 259030.0 https://is2-ssl.mzstatic.com/image/thumb/Purpl...
  • 1.部落冲突
  • 2.大逃杀
  • 3.PUBG移动版

是最受欢迎的游戏,我们也可以在App Store上说出流行游戏。

plt.figure(figsize=(6,3))
plt.subplot(131)
image = Image.open(BytesIO(requests.get(review.iloc[0,-1]).content))
plt.imshow(image)
plt.axis('off')

plt.subplot(132)
image = Image.open(BytesIO(requests.get(review.iloc[1,-1]).content))
plt.imshow(image)
plt.axis('off')

plt.subplot(133)
image = Image.open(BytesIO(requests.get(review.iloc[2,-1]).content))
plt.imshow(image)
plt.axis('off')

plt.show()

png

4.3 整体最佳游戏

best = data.sort_values(by=['Average User Rating','User Rating Count'], ascending=False)[['Name', 'Average User Rating', 'User Rating Count', 'Size', 
                                                                                         'Price', 'Icon URL']].head(10)
best
Name Average User Rating User Rating Count Size Price Icon URL
12473 Cash, Inc. Fame & Fortune Game 5.0 374772.0 234.6 0.00 https://is3-ssl.mzstatic.com/image/thumb/Purpl...
6089 Egg, Inc. 5.0 174591.0 71.4 0.00 https://is1-ssl.mzstatic.com/image/thumb/Purpl...
14155 AFK Arena 5.0 156766.0 215.3 0.00 https://is3-ssl.mzstatic.com/image/thumb/Purpl...
8388 South Park: Phone Destroyer\u2122 5.0 156044.0 124.2 0.00 https://is2-ssl.mzstatic.com/image/thumb/Purpl...
13261 From Zero to Hero: Cityman 5.0 146729.0 282.9 0.00 https://is2-ssl.mzstatic.com/image/thumb/Purpl...
15266 Sushi Bar Idle 5.0 123606.0 245.4 0.00 https://is1-ssl.mzstatic.com/image/thumb/Purpl...
10418 Fire Emblem Heroes 5.0 120283.0 167.5 0.00 https://is3-ssl.mzstatic.com/image/thumb/Purpl...
1649 Bloons TD 5 5.0 97776.0 127.2 2.99 https://is2-ssl.mzstatic.com/image/thumb/Purpl...
872 Naval Warfare 5.0 90214.0 41.2 0.00 https://is1-ssl.mzstatic.com/image/thumb/Purpl...
16434 Idle Roller Coaster 5.0 88855.0 223.5 0.00 https://is2-ssl.mzstatic.com/image/thumb/Purpl...
  • 1.Cash,Inc.成名和财富游戏以5.0的评分和374772条评论被认为是最佳的整体游戏
  • 2.还有很多其他游戏获得5.0评级和好评数
plt.figure(figsize=(3,3))
image = Image.open(BytesIO(requests.get(best.iloc[0,-1]).content))
plt.axis('off')
plt.imshow(image)
plt.show()

png

4.4 开发者top榜

fig = plt.figure(figsize = (12,8))

data.Developer.value_counts()[:20].plot(kind='bar',color = 'gray', alpha = 0.7, linewidth=4, edgecolor= 'black')

plt.xlabel("Developers")
plt.ylabel("Count")
plt.title("TOP 20 Most Commmon Developers ")
plt.xticks(rotation=90) 

plt.show()

png