手机游戏产业价值数十亿美元,公司在这些游戏的开发和市场推广上花费了大量的资金。利用苹果商店游戏数据集,可以洞察其中的细分市场——策略游戏。这一细分市场包括《部落冲突》、《植物大战僵尸》和《精灵宝可梦GO》等游戏。在案例中可以将游戏的评分数量作为游戏总体获得成功的代替指标,找出是什么因素造就了这款游戏的成功;或者可以衡量一段时间内市场的状况,并尝试预测这些游戏的发展方向。

目录
本数据集包括Apple App Store上17007个游戏APP的数据。它是使用iTunes API和App Store站点地图于2019年8月3日收集的。
- 文件:appstore_games.csv
- 字段:
字段名 | Description | 描述 |
---|---|---|
URL | The URL | 网页地址 |
ID | The assigned ID | 指定ID |
Name | The name | 游戏名称 |
Subtitle | The secondary text under the name | 游戏名称下的辅助文字 |
Icon URL | 512px × 512px jpg | 图标URL,格式512×512px,jpg |
Average User Rating | Rounded to nearest .5, requires at least 5 ratings | 用户平均评分,四舍五入至最接近的0.5倍数,至少需要5个评分 |
User Rating Count | Number of ratings internationally, null means it is below 5 | 用户评分次数,null表示评分次数少于5次 |
Price | Price in USD | 游戏价格(美元) |
In-app Purchases | Prices of available in-app purchases | 应用内购买的价格 |
Description | App description APP | 描述 |
Developer | App developer | APP开发者 |
Age Rating | Either 4+, 9+, 12+ or 17+ | 年龄分级,4 +,9 +,12 +或17+ |
Languages | ISO2A language codes | APP语言,ISO2A语言代码 |
Size | Size of the app in bytes | APP大小(以字节为单位) |
Primary Genre | Main genre | 主要类型(如:游戏/娱乐) |
Genres | Genres of the app | 应用类型(如:游戏/解谜) |
Original Release Date | When it was released | 最初发布日期 |
Current Version Release Date | When it was last updated | 当前版本发布日期 |
1.数据读取与预处理
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.offline as py
import plotly.graph_objs as go
import plotly.express as px
py.init_notebook_mode(connected=True)
%matplotlib inline
data = pd.read_csv('./input/appstore_games.csv', parse_dates=['Original Release Date', 'Current Version Release Date'])
data.head()
URL | ID | Name | Subtitle | Icon URL | Average User Rating | User Rating Count | Price | In-app Purchases | Description | Developer | Age Rating | Languages | Size | Primary Genre | Genres | Original Release Date | Current Version Release Date | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | https://apps.apple.com/us/app/sudoku/id284921427 | 284921427 | Sudoku | NaN | https://is2-ssl.mzstatic.com/image/thumb/Purpl... | 4.0 | 3553.0 | 2.99 | NaN | Join over 21,000,000 of our fans and download ... | Mighty Mighty Good Games | 4+ | DA, NL, EN, FI, FR, DE, IT, JA, KO, NB, PL, PT... | 15853568.0 | Games | Games, Strategy, Puzzle | 2008-11-07 | 2017-05-30 |
1 | https://apps.apple.com/us/app/reversi/id284926400 | 284926400 | Reversi | NaN | https://is4-ssl.mzstatic.com/image/thumb/Purpl... | 3.5 | 284.0 | 1.99 | NaN | The classic game of Reversi, also known as Oth... | Kiss The Machine | 4+ | EN | 12328960.0 | Games | Games, Strategy, Board | 2008-11-07 | 2018-05-17 |
2 | https://apps.apple.com/us/app/morocco/id284946595 | 284946595 | Morocco | NaN | https://is5-ssl.mzstatic.com/image/thumb/Purpl... | 3.0 | 8376.0 | 0.00 | NaN | Play the classic strategy game Othello (also k... | Bayou Games | 4+ | EN | 674816.0 | Games | Games, Board, Strategy | 2008-11-07 | 2017-05-09 |
3 | https://apps.apple.com/us/app/sudoku-free/id28... | 285755462 | Sudoku (Free) | NaN | https://is3-ssl.mzstatic.com/image/thumb/Purpl... | 3.5 | 190394.0 | 0.00 | NaN | Top 100 free app for over a year.\nRated "Best... | Mighty Mighty Good Games | 4+ | DA, NL, EN, FI, FR, DE, IT, JA, KO, NB, PL, PT... | 21552128.0 | Games | Games, Strategy, Puzzle | 2008-07-23 | 2017-05-30 |
4 | https://apps.apple.com/us/app/senet-deluxe/id2... | 285831220 | Senet Deluxe | NaN | https://is1-ssl.mzstatic.com/image/thumb/Purpl... | 3.5 | 28.0 | 2.99 | NaN | "Senet Deluxe - The Ancient Game of Life and A... | RoGame Software | 4+ | DA, NL, EN, FR, DE, EL, IT, JA, KO, NO, PT, RU... | 34689024.0 | Games | Games, Strategy, Board, Education | 2008-07-18 | 2018-07-22 |
首先删除掉无用的字段:游戏链接(URL
列)和指定ID(ID
列)。
data.drop(['URL', 'ID'], axis = 1, inplace = True)
数据中的Icon URL
列包含游戏APP的图标,我们需要根据图片的url链接,通过爬虫采集的方式进行获取,在这里选取其中的前20个进行展示。
import requests
from PIL import Image
from io import BytesIO
fig, ax = plt.subplots(4,5, figsize=(8,8))
for i in range(20):
r = requests.get(data['Icon URL'][i])
im = Image.open(BytesIO(r.content))
ax[i//5][i%5].imshow(im)
ax[i//5][i%5].axis('off')
plt.show()
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17007 entries, 0 to 17006
Data columns (total 16 columns):
Name 17007 non-null object
Subtitle 5261 non-null object
Icon URL 17007 non-null object
Average User Rating 7561 non-null float64
User Rating Count 7561 non-null float64
Price 16983 non-null float64
In-app Purchases 7683 non-null object
Description 17007 non-null object
Developer 17007 non-null object
Age Rating 17007 non-null object
Languages 16947 non-null object
Size 17006 non-null float64
Primary Genre 17007 non-null object
Genres 17007 non-null object
Original Release Date 17007 non-null datetime64[ns]
Current Version Release Date 17007 non-null datetime64[ns]
dtypes: datetime64[ns](2), float64(4), object(10)
memory usage: 2.1+ MB
可以看到在本数据集中,游戏名称下的辅助文字Subtitle
、用户平均评分Average User Rating
、用户评分次数User Rating Count
和应用内购买的价格In-app Purchases
等字段的空值较多。
data.describe(include='all')
Name | Subtitle | Icon URL | Average User Rating | User Rating Count | Price | In-app Purchases | Description | Developer | Age Rating | Languages | Size | Primary Genre | Genres | Original Release Date | Current Version Release Date | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 17007 | 5261 | 17007 | 7561.000000 | 7.561000e+03 | 16983.000000 | 7683 | 17007 | 17007 | 17007 | 16947 | 1.700600e+04 | 17007 | 17007 | 17007 | 17007 |
unique | 16847 | 5010 | 16847 | NaN | NaN | NaN | 3803 | 16473 | 8693 | 4 | 990 | NaN | 21 | 1004 | 3084 | 2512 |
top | Checkers (Draughts) | Emoji Stickers | https://is1-ssl.mzstatic.com/image/thumb/Purpl... | NaN | NaN | NaN | 0.99 | #NAME? | Tapps Tecnologia da Informa\xe7\xe3o Ltda. | 4+ | EN | NaN | Games | Games, Strategy, Puzzle | 2016-02-09 00:00:00 | 2019-01-08 00:00:00 |
freq | 2 | 14 | 2 | NaN | NaN | NaN | 943 | 17 | 123 | 11806 | 12467 | NaN | 16286 | 778 | 75 | 118 |
first | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2008-01-08 00:00:00 | 2008-01-08 00:00:00 |
last | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2019-12-07 00:00:00 | 2019-12-07 00:00:00 |
mean | NaN | NaN | NaN | 4.060905 | 3.306531e+03 | 0.813419 | NaN | NaN | NaN | NaN | NaN | 1.157064e+08 | NaN | NaN | NaN | NaN |
std | NaN | NaN | NaN | 0.751428 | 4.232256e+04 | 7.835732 | NaN | NaN | NaN | NaN | NaN | 2.036477e+08 | NaN | NaN | NaN | NaN |
min | NaN | NaN | NaN | 1.000000 | 5.000000e+00 | 0.000000 | NaN | NaN | NaN | NaN | NaN | 5.132800e+04 | NaN | NaN | NaN | NaN |
25% | NaN | NaN | NaN | 3.500000 | 1.200000e+01 | 0.000000 | NaN | NaN | NaN | NaN | NaN | 2.295014e+07 | NaN | NaN | NaN | NaN |
50% | NaN | NaN | NaN | 4.500000 | 4.600000e+01 | 0.000000 | NaN | NaN | NaN | NaN | NaN | 5.676895e+07 | NaN | NaN | NaN | NaN |
75% | NaN | NaN | NaN | 4.500000 | 3.090000e+02 | 0.000000 | NaN | NaN | NaN | NaN | NaN | 1.330271e+08 | NaN | NaN | NaN | NaN |
max | NaN | NaN | NaN | 5.000000 | 3.032734e+06 | 179.990000 | NaN | NaN | NaN | NaN | NaN | 4.005591e+09 | NaN | NaN | NaN | NaN |
观察上表,除了可以看到游戏价格Price
、用户平均评分Average User Rating
、用户评分次数User Rating Count
等列的最大、最小、平均等统计值外,我们发现数据集中APP的主要类型Primary Genre
存在21中取值,我们需要单独查看一下。
fig = plt.figure(figsize = (12,6))
# 根据应用主要类型进行分组,绘制柱状图
data['Primary Genre'].value_counts().plot(kind = 'bar',color = 'deepskyblue')
plt.show()
可以看到由于在数据收集中的不准确,APP数据中也混入了少量其他类型的APP,如教育、娱乐、公共事业、运动等等,我们需要将不属于游戏类型的APP进行剔除。同时为了便于后续的分析,我们需要剔除掉一些非常小众的应用,即提取具有至少200条评论的游戏APP进行分析。
data = data.loc[(data['User Rating Count'] > 200) & (data['Primary Genre'] == 'Games')]
2.不同游戏类型的分析
top_10 = data['Genres'].value_counts().sort_values(ascending=False).head(10)
trace = go.Pie(labels = top_10.index,
values = top_10.values,
title = 'Genres',
hoverinfo = 'percent+value',
textinfo = 'percent',
textposition = 'inside',
hole = 0.7,
showlegend = True,
marker = dict(colors = ['cyan'],line = dict(color = '#000000',width = 2)))
py.iplot([trace])
我们可以发现,App Store中的游戏主要为策略,模拟和动作等组合类型。
# 转换大小(MB)
data['Size'] = round(data['Size']/1024/1024,1)
plt.figure(figsize = (12,6))
ax = sns.kdeplot(data['Size'], shade = True, linewidth = 5, color = 'teal')
ax.set_xlabel('Size')
plt.show()
fig = px.scatter(data,x = "Average User Rating",y = "User Rating Count",size = "Size",color = "Genres",
log_y = True,size_max = 80)
fig.show()
在这里我们可以观察并发现哪些类型的游戏具有最高的用户评分,更多的评论以及游戏大小。
3.不同游戏属性的分析
3.1 游戏随时间的演变趋势
data['Release Year'] = data['Original Release Date'].dt.year
fig, ax = plt.subplots(1, 2, figsize=(15, 8))
sns.lineplot(x='Release Year', y='Price', data=data, palette='Wistia', ax=ax[0])
ax[0].set_title('Release Year vs Price')
sns.lineplot(x='Release Year', y='Size', data=data, palette='Wistia', ax=ax[1])
ax[1].set_title('Relase Year vs Size')
plt.tight_layout()
plt.show()
我们可以观察到游戏价格大幅下降,但游戏大小却增加了,这很明显因为我们大多数人都可以访问互联网,并且我们可以轻松下载1-2GB的游戏。
3.2 免费与付费游戏的比较
paid = data[data['Price']>0]
free = data[data['Price']==0]
fig, ax = plt.subplots(1, 2, figsize=(15,8))
sns.countplot(data=paid, y='Average User Rating', ax=ax[0], palette='plasma')
ax[0].set_title('Paid Games')
ax[0].set_xlim([0, 1000])
sns.countplot(data=free, y='Average User Rating', ax=ax[1], palette='viridis')
ax[1].set_title('Free Games')
ax[1].set_xlim([0,1000])
plt.tight_layout();
plt.show()
- 1.不出所料,付费游戏数量少于免费游戏
- 2.但是我们仍然看不到这两个类别的用户评分有任何区别
- 3.大多数游戏在4.0-5.0左右的评分都相当不错
- 4.价格似乎并没有影响评分,因为免费游戏和付费游戏的评分几乎相同
3.3 游戏适宜年龄分布
age = data['Age Rating'].value_counts()
trace = go.Pie(labels = age.index,
values = age.values,
title = 'Age Rating',
hoverinfo = 'percent+value',
textinfo = 'percent',
textposition = 'inside',
hole = 0.7,
showlegend = True,
marker = dict(colors = ['cyan', 'gold', 'red'],line = dict(color = '#000000',width = 2)))
py.iplot([trace])
- 1.大多数游戏为4+和9+
- 2.因此,明确的游戏开发人员正在寻找广泛的受众
4.游戏top榜
4.1 最昂贵的游戏
plt.figure(figsize=(12,6))
ax = sns.kdeplot(data['Price'], shade = True, linewidth = 5, color = 'm')
ax.set_ylabel('Count')
ax.set_xlabel('Price')
plt.show()
我们可以观察到,大多数应用程序都是免费的,其中一些成本在175到200美元之间。
price = data.sort_values(by='Price', ascending=False)[['Name', 'Price', 'Average User Rating', 'Size', 'Icon URL']].head(10)
price
Name | Price | Average User Rating | Size | Icon URL | |
---|---|---|---|---|---|
276 | SmartGo Kifu | 19.99 | 4.5 | 61.2 | https://is4-ssl.mzstatic.com/image/thumb/Purpl... |
2863 | Panzer Corps | 19.99 | 4.5 | 1456.2 | https://is4-ssl.mzstatic.com/image/thumb/Purpl... |
12362 | Tropico | 11.99 | 4.5 | 2429.5 | https://is2-ssl.mzstatic.com/image/thumb/Purpl... |
8111 | SteamWorld Heist | 9.99 | 4.5 | 231.4 | https://is5-ssl.mzstatic.com/image/thumb/Purpl... |
4868 | Six Ages: Ride Like the Wind | 9.99 | 5.0 | 730.8 | https://is4-ssl.mzstatic.com/image/thumb/Purpl... |
1129 | Avernum: Escape From the Pit HD | 9.99 | 5.0 | 166.4 | https://is4-ssl.mzstatic.com/image/thumb/Purpl... |
2147 | "Baldur's Gate II: EE" | 9.99 | 4.0 | 3574.1 | https://is1-ssl.mzstatic.com/image/thumb/Purpl... |
4473 | Banner Saga | 9.99 | 4.5 | 1936.5 | https://is2-ssl.mzstatic.com/image/thumb/Purpl... |
3415 | FTL: Faster Than Light | 9.99 | 4.5 | 171.5 | https://is4-ssl.mzstatic.com/image/thumb/Purpl... |
4425 | Icewind Dale | 9.99 | 4.0 | 2857.2 | https://is3-ssl.mzstatic.com/image/thumb/Purpl... |
SmartGo Kifu和Panzer Corps是App Store上最昂贵的游戏。
from PIL import Image
plt.figure(figsize=(6,3))
plt.subplot(121)
image = Image.open(BytesIO(requests.get(price.iloc[0,-1]).content))
plt.imshow(image)
plt.axis('off')
plt.subplot(122)
image = Image.open(BytesIO(requests.get(price.iloc[1,-1]).content))
plt.imshow(image)
plt.axis('off')
plt.show()
4.2 最受欢迎的游戏
fig,[ax1,ax2] = plt.subplots(1,2,figsize=(12,6))
sns.countplot(data = data, x ='Average User Rating', palette = 'gray', alpha = 0.7, linewidth=4, edgecolor= 'black',ax = ax1)
ax1.set_ylabel('Count')
ax1.set_xlabel('Average User Rating')
sns.kdeplot(data['User Rating Count'], shade = True, linewidth = 5, color = 'k',ax = ax2)
ax2.set_xlabel('User Rating Count')
plt.tight_layout()
plt.show()
review = data.sort_values(by='User Rating Count', ascending=False)[['Name', 'Price', 'Average User Rating', 'Size', 'User Rating Count', 'Icon URL']].head(10)
review
Name | Price | Average User Rating | Size | User Rating Count | Icon URL | |
---|---|---|---|---|---|---|
1378 | Clash of Clans | 0.0 | 4.5 | 153.8 | 3032734.0 | https://is2-ssl.mzstatic.com/image/thumb/Purpl... |
7187 | Clash Royale | 0.0 | 4.5 | 138.4 | 1277095.0 | https://is3-ssl.mzstatic.com/image/thumb/Purpl... |
13414 | PUBG MOBILE | 0.0 | 4.5 | 2273.6 | 711409.0 | https://is3-ssl.mzstatic.com/image/thumb/Purpl... |
1921 | Plants vs. Zombies\u2122 2 | 0.0 | 4.5 | 115.2 | 469562.0 | https://is2-ssl.mzstatic.com/image/thumb/Purpl... |
8139 | Pok\xe9mon GO | 0.0 | 3.5 | 268.5 | 439776.0 | https://is3-ssl.mzstatic.com/image/thumb/Purpl... |
2410 | Boom Beach | 0.0 | 4.5 | 193.4 | 400787.0 | https://is1-ssl.mzstatic.com/image/thumb/Purpl... |
12473 | Cash, Inc. Fame & Fortune Game | 0.0 | 5.0 | 234.6 | 374772.0 | https://is3-ssl.mzstatic.com/image/thumb/Purpl... |
8632 | Idle Miner Tycoon: Cash Empire | 0.0 | 4.5 | 423.4 | 283035.0 | https://is5-ssl.mzstatic.com/image/thumb/Purpl... |
38 | TapDefense | 0.0 | 3.5 | 7.4 | 273687.0 | https://is2-ssl.mzstatic.com/image/thumb/Purpl... |
3550 | Star Wars\u2122: Commander | 0.0 | 4.5 | 117.4 | 259030.0 | https://is2-ssl.mzstatic.com/image/thumb/Purpl... |
- 1.部落冲突
- 2.大逃杀
- 3.PUBG移动版
是最受欢迎的游戏,我们也可以在App Store上说出流行游戏。
plt.figure(figsize=(6,3))
plt.subplot(131)
image = Image.open(BytesIO(requests.get(review.iloc[0,-1]).content))
plt.imshow(image)
plt.axis('off')
plt.subplot(132)
image = Image.open(BytesIO(requests.get(review.iloc[1,-1]).content))
plt.imshow(image)
plt.axis('off')
plt.subplot(133)
image = Image.open(BytesIO(requests.get(review.iloc[2,-1]).content))
plt.imshow(image)
plt.axis('off')
plt.show()
4.3 整体最佳游戏
best = data.sort_values(by=['Average User Rating','User Rating Count'], ascending=False)[['Name', 'Average User Rating', 'User Rating Count', 'Size',
'Price', 'Icon URL']].head(10)
best
Name | Average User Rating | User Rating Count | Size | Price | Icon URL | |
---|---|---|---|---|---|---|
12473 | Cash, Inc. Fame & Fortune Game | 5.0 | 374772.0 | 234.6 | 0.00 | https://is3-ssl.mzstatic.com/image/thumb/Purpl... |
6089 | Egg, Inc. | 5.0 | 174591.0 | 71.4 | 0.00 | https://is1-ssl.mzstatic.com/image/thumb/Purpl... |
14155 | AFK Arena | 5.0 | 156766.0 | 215.3 | 0.00 | https://is3-ssl.mzstatic.com/image/thumb/Purpl... |
8388 | South Park: Phone Destroyer\u2122 | 5.0 | 156044.0 | 124.2 | 0.00 | https://is2-ssl.mzstatic.com/image/thumb/Purpl... |
13261 | From Zero to Hero: Cityman | 5.0 | 146729.0 | 282.9 | 0.00 | https://is2-ssl.mzstatic.com/image/thumb/Purpl... |
15266 | Sushi Bar Idle | 5.0 | 123606.0 | 245.4 | 0.00 | https://is1-ssl.mzstatic.com/image/thumb/Purpl... |
10418 | Fire Emblem Heroes | 5.0 | 120283.0 | 167.5 | 0.00 | https://is3-ssl.mzstatic.com/image/thumb/Purpl... |
1649 | Bloons TD 5 | 5.0 | 97776.0 | 127.2 | 2.99 | https://is2-ssl.mzstatic.com/image/thumb/Purpl... |
872 | Naval Warfare | 5.0 | 90214.0 | 41.2 | 0.00 | https://is1-ssl.mzstatic.com/image/thumb/Purpl... |
16434 | Idle Roller Coaster | 5.0 | 88855.0 | 223.5 | 0.00 | https://is2-ssl.mzstatic.com/image/thumb/Purpl... |
- 1.Cash,Inc.成名和财富游戏以5.0的评分和374772条评论被认为是最佳的整体游戏
- 2.还有很多其他游戏获得5.0评级和好评数
plt.figure(figsize=(3,3))
image = Image.open(BytesIO(requests.get(best.iloc[0,-1]).content))
plt.axis('off')
plt.imshow(image)
plt.show()
4.4 开发者top榜
fig = plt.figure(figsize = (12,8))
data.Developer.value_counts()[:20].plot(kind='bar',color = 'gray', alpha = 0.7, linewidth=4, edgecolor= 'black')
plt.xlabel("Developers")
plt.ylabel("Count")
plt.title("TOP 20 Most Commmon Developers ")
plt.xticks(rotation=90)
plt.show()