MongoDB分片实战

共计 32394 个字符，预计需要花费 81 分钟才能阅读完成。

主机	OS	备注
192.168.32.13	CentOS6.3 64 位	普通 PC
192.168.71.43	CentOS6.2 64 位	服务器，NUMA CPU 架构

MongoDB 版本：mongodb-linux-x86_64-2.4.1，下载地址：www.mongodb.org/downloads.

MongoDB 安装：分别在两台机器上安装好 mongodb 2.4.1，安装路径都为 /url/local/mongodb-2.4.1/

cd /usr/local/src/
wget http://fastdl.mongodb.org/linux/mongodb-linux-x86_64-2.4.1.tgz
tar -zxvf mongodb-linux-x86_64-2.4.1.tgz
cp -r mongodb-linux-x86_64-2.4.1 /usr/local/mongodb-2.4.1
cd /usr/local/mongodb-2.4.1/bin/
ll

可以看到 mongodb 安装成功有如下模块：

MongoDB 分片实战

mongodb启动和关闭 等在后面集群搭建中有详细说明，在此不再赘述。

Mongodb 一共有三种集群搭建的方式：Replica Set（副本集）、Sharding（切片）和 Master-Slaver（主从）。下面要搭建的是Sharding，Sharding 集群也是三种集群中最复杂的。

配置服务器 启动(192.168.32.13:10000)：

1.    ./bin/mongod --fork --dbpath data/config/ --logpath log/config.log –port 10000

路由服务器 启动(192.168.32.13:20000)：

1.    ./bin/mongos --port 20000 --configdb 192.168.32.13:10000 --logpath log/mongos.log  --fork

注意 1 ：配置 –conigdb 的时候 ip 地址不能填 localhost 或 127.0.0.1 否则 添加分片 时会返回如下错误信息：

1.    {2.            "ok" : 0,
3.            "errmsg" : "can't use localhost as a shard since all shards need to communicate. either use all shards and configdbs in localhost or all in actual IPs  host: 192.168.71.43:27017 isLocalHost:0"
4.    }

启动分片1(192.168.32.13:27019)：

1.    ./bin/mongod --dbpath data/shard3/ --logpath log/shard3.log  --fork --port 27019

启动分片 2(192.168.32.13:27020)：

1.    ./bin/mongod --dbpath data/shard3/ --logpath log/shard3.log  --fork --port 27020

启动分片 3(192.168.71.43:27017)：

1.    numactl --interleave=all ./bin/mongod --dbpath data/shard1/ --logpath log/shard1.log  --fork --port 27017

启动分片 4(192.168.71.43:27018)：

1.    numactl --interleave=all ./bin/mongod --dbpath data/shard2/ --logpath log/shard2.log  --fork --port 27018

说明：关于这里为什么加 numactl –interleave=all，后面有详细说明。

Note: 在生产环境可以将启动的配置信息写入文件，然后启动的时候指定配置文件，这样便于管理：

1.    vi conf/config.conf
2.    bpath=data/config/
3.    logpath=log/config.log
4.    port=10000
5.    fork=true

6.    ./bin/mongod -f conf/config.conf

添加分片1(192.168.32.13:27019)：

1.    ./bin/mongo --port 20000
2.    mongos> use admin
3.    switched to db admin
4.    mongos> db.runCommand({addshard:"192.168.32.13:27019",allowLocal:true })

注意 2 ：同样这里的 192.168.32.13 不能用 localhost 或 127.0.0.1 代替，并且当路由进程和分片在同一台机器上要指定 allowLocal 为 true, 因为 MongoDB 尽量避免错误的配置，将集群配置在本地，所以这个配置指明当前仅仅是用于开发。

添加分片 3(192.168.71.43:27017)：

1.    mongos> db.runCommand({addshard:"192.168.71.43:27017" })

类似的添加分片 2,4。

分片添加成功返回类似下面的信息(当前 mongodb 版本为 2.4.1)：

1.    {"shardAdded" : "shard0000", "ok" : 1 }

删除分片：如果要删除分片的话可以 removeshard 命令：

1.    mongos> use admin
2.    switched to db admin
3.    mongos> db.runCommand({"removeshard":"192.168.32.13:27020"})
4.    {5.            "msg" : "draining started successfully",
6.            "state" : "started",
7.            "shard" : "shard0001",
8.            "note" : "you need to drop or movePrimary these databases",
9.            "dbsToMove" : [10.                    "test3"
11.            ],
12.            "ok" : 1
13.    }

移除分片需要一个过程，MongoDB 会把移除的片上的数据（块）挪到其他片上，移动中会显示进度：

1.    mongos> db.runCommand({"removeshard":"192.168.32.13:27020"})
2.    {3.            "msg" : "draining ongoing",
4.            "state" : "ongoing",
5.            "remaining" : {6.                    "chunks" : NumberLong(0),
7.                    "dbs" : NumberLong(1)
8.            },
9.            "note" : "you need to drop or movePrimary these databases",
10.            "dbsToMove" : [11.                    "test3"
12.            ],
13.            "ok" : 1
14.    }

注意：如果删除的片是数据库的大本营(基片)， 必须手动移动或删除数据库，用 moveprimary 命令，上面的示例中就提示 192.168.32.13:27020 是 test3 库的大本营(primary)，这个信息可以通过查看 config.databases 看到：

1.    mongos> use config
2.    switched to db config
3.    mongos> db.databases.find()
4.    {"_id" : "test3", "partitioned" : false, "primary" : "shard0001" }

这里 shard0001 就是 192.168.32.13:27020，下面通过 moveprimary 命令移除 test3:

1.    mongos> use admin 
2.    switched to db admin
3.    mongos> db.runCommand({"moveprimary":"test3","to":"192.168.32.13:27019"})
4.    {"primary " : "shard0000:192.168.32.13:27019", "ok" : 1 }

这时再查看 config.databases 会发现 test3 的大本营变成了 shard0000(192.168.32.13:27019)

这时再执行 removeshard 命令则能成功移除分片了：

1.    mongos> db.runCommand({"removeshard":"192.168.32.13:27020"})
2.    {3.            "msg" : "removeshard completed successfully",
4.            "state" : "completed",
5.            "shard" : "shard0001",
6.            "ok" : 1
7.    }

进入 mongos 进程 config 库可以看到目前分片的情况：

1.    ./bin/mongo –port 20000
2.    use config
3.    db.shards.find()
1.    mongos> use config
2.    switched to db config
3.    mongos> db.shards.find()
4.    {"_id" : "shard0000", "host" : "192.168.32.13:27019" }
5.    {"_id" : "shard0001", "host" : "192.168.71.43:27017" }
6.    {"_id" : "shard0002", "host" : "192.168.71.43:27018" }

注意 3 ：如果配置过程中发生过上面 注意 1 中出现的情况，即配置 configdb 的时候用了 localhost 或 127.0.0.1, 则运行 db.shards.find()可能会出现如下错误：

1.    mongos> db.shards.find()
2.    error: {3.            "$err" : "could not initialize sharding on connection 192.168.32.13:10000 :: caused by :: mongos specified a different config database string : stored : localhost:10000 vs given : 192.168.32.13:10000",
4.            "code" : 15907
5.    }

解决方法是重新启动 config 进程。

查看分片后的数据库：

1.    ./bin/mongo –port 20000
2.    use test
3.    db.test.user.insert({“test”:“test”})
4.    ……
5.    use config
6.    db.databases.find()
7.    {"_id" : "admin", "partitioned" : false, "primary" : "config" }
8.    {"_id" : "test", "partitioned" : false, "primary" : "shard0000" }
9.    {"_id" : "test2", "partitioned" : false, "primary" : "shard0000" }
10.    {"_id" : "test3", "partitioned" : false, "primary" : "shard0001" }

“_id”, 字符串。表示数据库名。

“partioned”, 布尔型。如果为 true 则表示开启了分片功能。

“primary”，字符串，这个值与“_id”对应表示这个数据库的“大本营“在哪里，不论分片与否，数据库总是会有个“大本营”，创建数据库时会随机选择一个片，也就是说大本营是开始创建数据库文件的位置。虽然分片的时候数据库也会用到很多别的服务器，但是从这分片开始。

至此整个 mongodb 分片集群基本搭建完成，但是想让分片正常、高效、稳定的运行还有很多工作要做，下一节将在此基础上做一些简单的测试。

更多 MongoDB 相关教程见以下内容：

CentOS 编译安装 MongoDB 与 mongoDB 的 php 扩展 http://www.linuxidc.com/Linux/2012-02/53833.htm

CentOS 6 使用 yum 安装 MongoDB 及服务器端配置 http://www.linuxidc.com/Linux/2012-08/68196.htm

Ubuntu 13.04 下安装 MongoDB2.4.3 http://www.linuxidc.com/Linux/2013-05/84227.htm

MongoDB 入门必读(概念与实战并重) http://www.linuxidc.com/Linux/2013-07/87105.htm

Ubunu 14.04 下 MongoDB 的安装指南 http://www.linuxidc.com/Linux/2014-08/105364.htm

《MongoDB 权威指南》(MongoDB: The Definitive Guide)英文文字版[PDF] http://www.linuxidc.com/Linux/2012-07/66735.htm

Nagios 监控 MongoDB 分片集群服务实战 http://www.linuxidc.com/Linux/2014-10/107826.htm

基于 CentOS 6.5 操作系统搭建 MongoDB 服务 http://www.linuxidc.com/Linux/2014-11/108900.htm

更多详情见请继续阅读下一页的精彩内容：http://www.linuxidc.com/Linux/2016-04/130008p2.htm

上节搭建的分片集群从逻辑上看如下图所示：

MongoDB 分片实战

片：可以普通的 mongod 进程，也可以是副本集。但是即使一片内有多台服务器，也只能有一个主服务器，其他的服务器保存相同的数据。

mongos路由进程: 它路由所有请求，然后将结果聚合。它不保存存储数据或配置信息。

配置服务器：存储集群的配置信息。

整个分布式的集群通过 mongos 对客户端提供了一个透明统一的接口，客户端不需要关系具体的分片细节，所有分片的动作都是自动执行的，那是如何做到透明和自动的。

上节中建立好集群后，默认的是不会将存储的每条数据进行分片处理，需要在数据库和集合的粒度上都开启分片功能。开启 test 库的分片功能：

1.    ./bin/mongo –port 20000
2.    mongos> use admin 
3.    switched to db admin
4.    mongos> db.runCommand({"enablesharding":"test"})
5.    {"ok" : 1 }

开启 user 集合分片功能：

1.    mongos> db.runCommand({"shardcollection":"test.user","key":{"_id":1}})
2.    {"collectionsharded" : "test.user", "ok" : 1 }

注意：需要切换到 admin 库执行命令。

片键：上面的 key 就是所谓的片键（shard key）。MongoDB 不允许插入没有片键的文档。但是允许不同文档的片键类型不一样，MongoDB 内部对不同类型有一个排序：

MongoDB 分片实战

片键的选择至关重要，后面会进行详细的说明。

这时再切换到 config 库如下查看:

1.    mongos> use config
2.    mongos> db.databases.find()
3.    {"_id" : "admin", "partitioned" : false, "primary" : "config" }
4.    {"_id" : "OSSP10", "partitioned" : false, "primary" : "shard0000" }
5.    {"_id" : "test", "partitioned" : true, "primary" : "shard0000" }
6.    {"_id" : "test2", "partitioned" : false, "primary" : "shard0000" }
7.    {"_id" : "test3", "partitioned" : false, "primary" : "shard0001" }
8.    mongos> db.chunks.find()
9.    {"_id" : "test.user-_id_MinKey", "lastmod" : {"t" : 1, "i" : 0 }, "lastmodEpoch" : ObjectId("515a3797d249863e35f0e3fe"), "ns" : "test.user", "min" : {"_id" : {"$minKey" : 1 } }, "max" : {"_id" : {"$maxKey" : 1 } }, "shard" : "shard0000" }

Chunks：理解 MongoDB 分片机制的关键是理解 Chunks。mongodb 不是一个分片上存储一个区间，而是每个分片包含多个区间，这每个区间就是一个块。

1.    mongos> use config
2.    mongos> db.settings.find()
3.    {"_id" : "chunksize", "value" : 64 }
4.    ……

平衡：如果存在多个可用的分片，只要块的数量足够多，MongoDB 就会把数据迁移到其他分片上，这个迁移的过程叫做平衡。Chunks 默认的大小是 64M200M，查看 config.settings 可以看到这个值：

只有当一个块的大小超过了 64M200M，MongoDB 才会对块进行分割（但根据我的实践 2.4 版本块分割的算法好像并不是等到超过 chunkSize 大小就分割，也不是一分为二, 有待继续学习），并当最大分片上的块数量超过另一个最少分片上块数量达到一定阈值会发生所谓的 chunk migration 来实现各个分片的平衡 (如果启动了 balancer 进程的话)。这个阈值随着块的数量不同而不同：

MongoDB 分片实战

这带来一个问题是我们测开发测试的时候，往往希望通过观察数据在迁移来证明分片是否成功，但 64M200M显然过大，解决方法是我们可以在启动 mongos 的时候用—chunkSize 来制定块的大小，单位是 MB。

1.    ./bin/mongos --port 20000 --configdb 192.168.32.13:10000 --logpath log/mongos.log  --fork --chunkSize 1

我指定 1MB 的块大小重新启动了一下 mongos 进程。

或者向下面这要修改 chunkSize 大小：

1.    mongos> use config
2.    mongos> db.settings.save({ _id:"chunksize", value: 1 } )

2.4 版本之前 MongoDB 基于范围进行分片的（2.2.4 会介绍基于哈希的分片），对一个集合分片时，一开始只会创建一个块，这个块的区间是（-∞，+∞），-∞表示 MongoDB 中的最小值，也就是上面 db.chunks.find()我们看到的$minKey，+∞表示最大值即$maxKey。Chunck 的分割是自动执行的，类似于细胞分裂，从区间的中间分割成两个。

现在对上面我们搭建的 MongoDB 分片集群做一个简单的测试。目前我们的集群情况如下（为了方面展示我采用了可视化的工具 MongoVUE）：

MongoDB 分片实战

像上面提到那样为了尽快能观察到测试的效果，我么启动 mongos 时指定的 chunkSize 为 1MB。

我们对 OSSP10 库和 bizuser 集合以 Uid 字段进行分片：

1.    mongos> use admin
2.    switched to db admin
3.    mongos> db.runCommand({"enablesharding":"OSSP10"})
4.    {"ok" : 1 }
5.    mongos> db.runCommand({"shardcollection":"OSSP10.bizuser","key":{"Uid":1}})
6.    {"collectionsharded" : "OSSP10.bizuser", "ok" : 1 }

在插入数据之前查看一下 config.chunks, 通过查看这个集合我们可以了解数据是怎么切分到集群的：

1.    mongos> db.chunks.find()
2.    {"_id" : "OSSP10.bizuser-Uid_MinKey", "lastmod" : {"t" : 1, "i" : 0 }, "lastmodEpoch" : ObjectId("515a8f1dc1de43249d8ce83e"), "ns" : "OSSP10.bizuser", "min" : {"Uid" : {"$minKey" : 1 } }, "max" : {"Uid" : {"$maxKey" : 1 } }, "shard" : "shard0000" }

开始只有一个块，区间（-∞，+∞）在 shard0000(192.168.32.13:27019)上，下面我们循环的插入 100000 条数据：

1.    mongos> use OSSP10
2.    switched to db OSSP10
3.    mongos> for(i=0;i<100000;i++){db.bizuser.insert({"Uid":i,"Name":"zhanjindong","Age":13,"Date":new Date()}); }

完成后我们在观察一下 config.chunks 的情况：

1.    mongos> db.chunks.find()
2.    {"_id" : "OSSP10.bizuser-Uid_MinKey", "lastmod" : {"t" : 3, "i" : 0 }, "lastmodEpoch" : ObjectId("515a8f1dc1de43249d8ce83e"), "ns" : "OSSP10.bizuser", "min" : {"Uid" : {"$minKey" : 1 } }, "max" : {"Uid" : 0 }, "shard" : "shard0002" }
3.    {"_id" : "OSSP10.bizuser-Uid_0.0", "lastmod" : {"t" : 3, "i" : 1 }, "lastmodEpoch" : ObjectId("515a8f1dc1de43249d8ce83e"), "ns" : "OSSP10.bizuser", "min" : {"Uid" : 0 }, "max" : {"Uid" : 6747 }, "shard" : "shard0000" }
4.    {"_id" : "OSSP10.bizuser-Uid_6747.0", "lastmod" : {"t" : 2, "i" : 0 }, "lastmodEpoch" : ObjectId("515a8f1dc1de43249d8ce83e"), "ns" : "OSSP10.bizuser", "min" : {"Uid" : 6747 }, "max" : {"Uid" : {"$maxKey" : 1 } }, "shard" : "shard0001" }

我们可以看见刚开始的块分裂成了三块分别是（-∞,0）在 shard0002 上,[0,6747）在 shard0000 上和[6747,+ ∞）在 shard0001 上。

说明：这里需要说明的是 MongoDB 中的区间是左闭右开的。这样说上面第一个块不包含任何数据，至于为什么还不清楚，有待继续调研。

我们持续上面的插入操作（uid 从 0 到 100000）我们发现块也在不停的分裂：

1.    mongos> db.chunks.find()
2.    {"_id" : "OSSP10.bizuser-Uid_MinKey", "lastmod" : {"t" : 3, "i" : 0 }, "lastmodEpoch" : ObjectId("515a8f1dc1de43249d8ce83e"), "ns" : "OSSP10.bizuser", "min" : {"Uid" : {"$minKey" : 1 } }, "max" : {"Uid" : 0 }, "shard" : "shard0002" }
3.    {"_id" : "OSSP10.bizuser-Uid_0.0", "lastmod" : {"t" : 3, "i" : 1 }, "lastmodEpoch" : ObjectId("515a8f1dc1de43249d8ce83e"), "ns" : "OSSP10.bizuser", "min" : {"Uid" : 0 }, "max" : {"Uid" : 6747 }, "shard" : "shard0000" }
4.    {"_id" : "OSSP10.bizuser-Uid_6747.0", "lastmod" : {"t" : 3, "i" : 4 }, "lastmodEpoch" : ObjectId("515a8f1dc1de43249d8ce83e"), "ns" : "OSSP10.bizuser", "min" : {"Uid" : 6747 }, "max" : {"Uid" : 45762 }, "shard" : "shard0001" }
5.    {"_id" : "OSSP10.bizuser-Uid_99999.0", "lastmod" : {"t" : 3, "i" : 3 }, "lastmodEpoch" : ObjectId("515a8f1dc1de43249d8ce83e"), "ns" : "OSSP10.bizuser", "min" : {"Uid" : 99999 }, "max" : {"Uid" : {"$maxKey" : 1 } }, "shard" : "shard0001" }
6.    {"_id" : "OSSP10.bizuser-Uid_45762.0", "lastmod" : {"t" : 3, "i" : 5 }, "lastmodEpoch" : ObjectId("515a8f1dc1de43249d8ce83e"), "ns" : "OSSP10.bizuser", "min" : {"Uid" : 45762 }, "max" : {"Uid" : 99999 }, "shard" : "shard0001" }

分片的规则正是上面提到的块的自动分裂和平衡，可以发现不同的块是分布在不同的分片上。

注意：这里使用 Uid 作为片键通常是有问题的，2.3 对递增片键有详细说明。

Note: 通过 sh.status()可以很直观的查看当前整个集群的分片情况，类似如下：

MongoDB 分片实战

面的演示都是从零开始构建分片集群，但是实际中架构是一个演进的过程，一开始都不会进行分片，只有当数据量增长到一定程序才会考虑分片，那么对已有的海量数据如何处理。我们在上节的基础上继续探索。《深入学习 MongoDB》中有如下描述：

MongoDB 分片实战

我们现在来尝试下（也许我使用的最新版本会有所有变化），先在 192.168.71.43:27179 上再启一个 mongod 进程：

1.    numactl --interleave=all ./bin/mongod --dbpath data/shard3/ --logpath log/shard3.log  --fork --port 27019

现在我们像这个 mongod 进程中的 OSSP10 库的 bizuser 集合中插入一些数据：

1.    for(i=0;i<10;i++){db.bizuser.insert({"Uid":i,"Name":"zhanjindong2","Age":13,"Date":new Date()}); }

我们这个时候尝试将这个 mongod 加入到之前的集群中去：

1.    mongos> db.runCommand({addshard:"192.168.71.43:27019" })
2.    {3.            "ok" : 0,
4.            "errmsg" : "can't add shard 192.168.71.43:27019 because a local database 'OSSP10' exists in another shard0000:192.168.32.13:27019"
5.    }

果然最新版本依旧不行。我们删除 OSSP10 库，新建个 OSSP20 库再添加分片则可以成功。

1.    {"shardAdded" : "shard0003", "ok" : 1 }

那么看来我们只能在之前已有的数据的 mongod 进程基础上搭建分片集群，那么这个时候添加切片，MongoDB 对之前的数据又是如何处理的呢？我们现在集群中移除上面添加的切片 192.168.71.43:27019，然后在删除集群中的 OSSP10 库。

这次我们尽量模拟生产环境进行测试，重新启动 mongos 并不指定 chunksize，使用默认的最优的大小（64M200M）。然后在 192.168.71.43:27019 中新建 OSSP10 库并向集合 bizuser 中插入 100W 条数据：

1.    for(i=0;i<1000000;i++){db.bizuser.insert({"Uid":i,"Name":"zhanjindong2","Age":13,"Date":new Date()}); }

然后将此节点添加到集群中，并仍然使用递增的 Uid 作为片键对 OSSP10.bizuser 进行分片：

1.    mongos> use admin
2.    switched to db admin
3.    mongos> db.runCommand({"enablesharding":"OSSP10"})
4.    {"ok" : 1 }
5.    mongos> db.runCommand({"shardcollection":"OSSP10.bizuser","key":{"Uid":1}})
6.    {"collectionsharded" : "OSSP10.bizuser", "ok" : 1 }

观察一下 config.chunks 可以看见对新添加的切片数据进行切分并进行了平衡（每个分片上一个块），基本是对 Uid（0~1000000）进行了四等分，当然没那精确：

1.    mongos> use config
2.    switched to db config
3.    mongos> db.chunks.find()
4.    {"_id" : "OSSP10.bizuser-Uid_MinKey", "lastmod" : {"t" : 2, "i" : 0 }, "lastmodEpoch" : ObjectId("515b8f3754fde3fbab130f92"), "ns" : "OSSP10.bizuser", "min" : {"Uid" : {"$minKey" : 1 } }, "max" : {"Uid" : 0 }, "shard" : "shard0000" }
5.    {"_id" : "OSSP10.bizuser-Uid_381300.0", "lastmod" : {"t" : 4, "i" : 1 }, "lastmodEpoch" : ObjectId("515b8f3754fde3fbab130f92"), "ns" : "OSSP10.bizuser", "min" : {"Uid" : 381300 }, "max" : {"Uid" : 762601 }, "shard" : "shard0003" }
6.    {"_id" : "OSSP10.bizuser-Uid_762601.0", "lastmod" : {"t" : 1, "i" : 2 }, "lastmodEpoch" : ObjectId("515b8f3754fde3fbab130f92"), "ns" : "OSSP10.bizuser", "min" : {"Uid" : 762601 }, "max" : {"Uid" : {"$maxKey" : 1 } }, "shard" : "shard0003" }
7.    {"_id" : "OSSP10.bizuser-Uid_0.0", "lastmod" : {"t" : 3, "i" : 0 }, "lastmodEpoch" : ObjectId("515b8f3754fde3fbab130f92"), "ns" : "OSSP10.bizuser", "min" : {"Uid" : 0 }, "max" : {"Uid" : 250000 }, "shard" : "shard0001" }
8.    {"_id" : "OSSP10.bizuser-Uid_250000.0", "lastmod" : {"t" : 4, "i" : 0 }, "lastmodEpoch" : ObjectId("515b8f3754fde3fbab130f92"), "ns" : "OSSP10.bizuser", "min" : {"Uid" : 250000 }, "max" : {"Uid" : 381300 }, "shard" : "shard0002" }

MongoDB 这种自动分片和平衡的能力使得在迁移老数据的时候变得非常简单，但是如果数据特别大话则可能会非常的慢。

MongoDB2.4 以上的版本支持基于哈希的分片，我们在上面的基础上继续探索。

交代一下 因为环境变动这部分示例操作不是在之前的那个集群上，但是系统环境都是一样，逻辑架构也一样，只是 ip 地址不一样（只搭建在一台虚拟机上）, 后面环境有改动不再说明：

配置服务器：192.168.129.132:10000

路由服务器：192.168.129.132:20000

分片 1:192.168.129.132:27017

分片 2:192.168.129.132:27018

……其他分片端口依次递增。

MongoDB 分片实战

我们重建之前的 OSSP10 库，我们仍然使用 OSSP10.bizuser 不过这次启动哈希分片，选择_id 作为片键：

1.    mongos> use admin
2.    switched to db admin
3.    mongos> db.runCommand({"enablesharding":"OSSP10"})
4.    {"ok" : 1 }
5.    mongos> db.runCommand({"shardcollection":"OSSP10.bizuser","key":{"_id":"hashed"}})
6.    {"collectionsharded" : "OSSP10.bizuser", "ok" : 1 }

我们现在查看一下 config.chunks

1.    mongos> use config
2.    switched to db config
3.    mongos> db.chunks.find()
4.    {"_id" : "OSSP10.bizuser-_id_MinKey", "lastmod" : {"t" : 2, "i" : 2 }, "lastmodEpoch" : ObjectId("515e47ab56e0b3341b76f145"), "ns" : "OSSP10.bizuser", "min" : {"_id" : {"$minKey" : 1 } }, "max" : {"_id" : NumberLong("-4611686018427387902") }, "shard" : "shard0000" }
5.    {"_id" : "OSSP10.bizuser-_id_-4611686018427387902", "lastmod" : {"t" : 2, "i" : 3 }, "lastmodEpoch" : ObjectId("515e47ab56e0b3341b76f145"), "ns" : "OSSP10.bizuser", "min" : {"_id" : NumberLong("-4611686018427387902") }, "max" : {"_id" : NumberLong(0) }, "shard" : "shard0000" }
6.    {"_id" : "OSSP10.bizuser-_id_0", "lastmod" : {"t" : 2, "i" : 4 }, "lastmodEpoch" : ObjectId("515e47ab56e0b3341b76f145"), "ns" : "OSSP10.bizuser", "min" : {"_id" : NumberLong(0) }, "max" : {"_id" : NumberLong("4611686018427387902") }, "shard" : "shard0001" }
7.    {"_id" : "OSSP10.bizuser-_id_4611686018427387902", "lastmod" : {"t" : 2, "i" : 5 }, "lastmodEpoch" : ObjectId("515e47ab56e0b3341b76f145"), "ns" : "OSSP10.bizuser", "min" : {"_id" : NumberLong("4611686018427387902") }, "max" : {"_id" : {"$maxKey" : 1 } }, "shard" : "shard0001" }

MongoDB 的哈希分片使用了哈希索引:

MongoDB 分片实战

哈希分片仍然是基于范围的，只是将提供的片键散列成一个非常大的长整型作为最终的片键。官方文中描述如下：

不像普通的基于范围的分片，哈希分片的片键只能使用一个字段。

选择哈希片键最大的好处就是保证数据在各个节点分布基本均匀，下面使用_id 作为哈希片键做个简单的测试：

1.    mongos> db.runCommand({"enablesharding":"mydb"})
2.    db.runCommand({"shardcollection":"mydb.mycollection","key":{"_id":"hashed"}})
3.    mongos> use mydb
4.    mongos> for(i=0;i<333333;i++){db.mycollection.insert({"Uid":i,"Name":"zhanjindong2","Age":13,"Date":new Date()}); }

通过 MongoVUE 观察三个切片上的数据量非常均匀：

MongoDB 分片实战

上面是使用官方文档中推荐的 Objectid 作为片键，情况很理想。如果使用一个自增长的 Uid 作为片键呢：

1.    db.runCommand({"shardcollection":"mydb.mycollection","key":{"Uid":"hashed"}})
2.    for(i=0;i<333333;i++){db.mycollection.insert({"Uid":i,"Name":"zhanjindong2","Age":13,"Date":new Date()}); }

先不考虑集群中每个分片是副本集复杂的情况，只考虑了每个分片只有一个 mongod 进程，这种配是只是不够健壮还是非常脆弱。我们在 testdb.testcollection 上启动分片，然后向其中插入一定量的数据（视你的 chunkSize 而定），通过观察 config.chunks 确保 testdb.testcollection 上的数据被分布在了不同的分片上。

1.    mongos> db.runCommand({"enablesharding":"testdb"})
2.    mongos> db.runCommand({"shardcollection":"testdb.testcollection","key":{"Uid":1}})
3.    use testdb
4.    for(i=0;i<1000000;i++){db.testcollection.insert({"Uid":i,"Name":"zhanjindong","Age":13,"Date":new Date()}); }
5.    mongos> use config
6.    switched to db config
7.    mongos> db.chunks.find()
8.    ……
9.    {"_id" : "testdb.testcollection-Uid_747137.0", "lastmod" : {"t" : 4, "i" : 1 }, "lastmodEpoch" : ObjectId("515fc5f0365c860f0bf8e0cb"), "ns" : "testdb.testcollection", "min" : {"Uid" : 747137 }, "max" : {"Uid" : 962850 }, "shard" : "shard0000" }
10.    ……
11.    {"_id" : "testdb.testcollection-Uid_0.0", "lastmod" : {"t" : 1, "i" : 3 }, "lastmodEpoch" : ObjectId("515fc5f0365c860f0bf8e0cb"), "ns" : "testdb.testcollection", "min" : {"Uid" : 0 }, "max" : {"Uid" : 6757 }, "shard" : "shard0001" }
12.    ……
13.    {"_id" : "testdb.testcollection-Uid_531424.0", "lastmod" : {"t" : 2, "i" : 4 }, "lastmodEpoch" : ObjectId("515fc5f0365c860f0bf8e0cb"), "ns" : "testdb.testcollection", "min" : {"Uid" : 531424 }, "max" : {"Uid" : 747137 }, "shard" : "shard0002" }

这时我们强制杀掉 shard2 进程：

1.    [root@localhost mongodb-2.4.1]# ps -ef |grep mongo
2.    root      6329     1  0 22:52 ?        00:00:08 ./bin/mongod --dbpath data/shard3/ --logpath log/shard3.log --fork --port 27019
3.    kill -9 6329

我们尝试用属于不同范围的 Uid 对 testdb.testcollection 进行写入操作(这里插入一条记录应该不会导致新的块迁移)：

1.    [root@localhost mongodb-2.4.1]# ps -ef |grep mongo
2.    use testdb
3.    switched to db testdb
4.    #### 写
5.    mongos> db.testcollection.insert({"Uid":747138,"Name":"zhanjindong","Age":13,"Date":new Date()})
6.    ## 向 shard0000 插入没有问题
7.    mongos> db.testcollection.insert({"Uid":6756,"Name":"zhanjindong","Age":13,"Date":new Date()})
8.    ## 向 shard0001 插入没有问题
9.    mongos> db.testcollection.insert({"Uid":531425,"Name":"zhanjindong","Age":13,"Date":new Date()})
10.    socket exception [CONNECT_ERROR] for 192.168.129.132:27019
11.    ## 向 shard0002 插入出问题
12.    #### 读
13.    mongos> db.testcollection.find({Uid:747139})
14.    ## 从 shard0000 读取没有问题
15.    mongos> db.testcollection.find({Uid: 2})
16.    ## 从 shard0001 读取没有问题
17.    mongos> db.testcollection.find({Uid: 531426})
18.    error: {19.        "$err" : "socket exception [SEND_ERROR] for 192.168.129.132:27019",
20.        "code" : 9001,
21.        "shard" : "shard0002"
22.    }
23.    ## 从 shard0002 读取有问题
24.    b.testcollection.count()
25.    Sat Apr  6 00:23:19.246 JavaScript execution failed: count failed: {26.        "code" : 11002,
27.        "ok" : 0,
28.        "errmsg" : "exception: socket exception [CONNECT_ERROR] for 192.168.129.132:27019"
29.    } at src/mongo/shell/query.js:L180

可以看到插入操作 涉及到分片 shard0002 的操作都无法完成。这是顺理成章的。

下面我们重新启动 shard3 看集群是否能自动恢复正常操作：

1.    ./bin/mongod --dbpath data/shard3/ --logpath log/shard3.log --fork --port 27019
2.    mongos> use config
3.    switched to db config
4.    mongos> db.shards.find()
5.    {"_id" : "shard0000", "host" : "192.168.129.132:27017" }
6.    {"_id" : "shard0001", "host" : "192.168.129.132:27018" }
7.    {"_id" : "shard0002", "host" : "192.168.129.132:27019" }

重复上面的插入和读取 shard0002 的操作：

1.    mongos> db.testcollection.insert({"Uid":531425,"Name":"zhanjindong","Age":13,"Date":new Date()})
2.    ## 没有问题
3.    db.testcollection.find({Uid: 531426})
4.    {"_id" : ObjectId("515fc791b86c543aa1d7613e"), "Uid" : 531426, "Name" : "zhanjindong", "Age" : 13, "Date" : ISODate("2013-04-06T06:58:25.516Z") }
5.    ## 没有问题

总结：当集群中某个分片宕掉以后，只要不涉及到该节点的操纵仍然能进行。当宕掉的节点重启后，集群能自动从故障中恢复过来。

这一节在前一节搭建的集群基础上做了一些简单的测试，下一节的重点是性能和优化相关。

主机	OS	备注
192.168.32.13	CentOS6.3 64 位	普通 PC
192.168.71.43	CentOS6.2 64 位	服务器，NUMA CPU 架构

MongoDB 版本：mongodb-linux-x86_64-2.4.1，下载地址：www.mongodb.org/downloads.

MongoDB 安装：分别在两台机器上安装好 mongodb 2.4.1，安装路径都为 /url/local/mongodb-2.4.1/

cd /usr/local/src/
wget http://fastdl.mongodb.org/linux/mongodb-linux-x86_64-2.4.1.tgz
tar -zxvf mongodb-linux-x86_64-2.4.1.tgz
cp -r mongodb-linux-x86_64-2.4.1 /usr/local/mongodb-2.4.1
cd /usr/local/mongodb-2.4.1/bin/
ll

可以看到 mongodb 安装成功有如下模块：

MongoDB 分片实战

mongodb启动和关闭 等在后面集群搭建中有详细说明，在此不再赘述。

配置服务器 启动(192.168.32.13:10000)：

1.    ./bin/mongod --fork --dbpath data/config/ --logpath log/config.log –port 10000

路由服务器 启动(192.168.32.13:20000)：

1.    ./bin/mongos --port 20000 --configdb 192.168.32.13:10000 --logpath log/mongos.log  --fork

注意 1 ：配置 –conigdb 的时候 ip 地址不能填 localhost 或 127.0.0.1 否则 添加分片 时会返回如下错误信息：

1.    {2.            "ok" : 0,
3.            "errmsg" : "can't use localhost as a shard since all shards need to communicate. either use all shards and configdbs in localhost or all in actual IPs  host: 192.168.71.43:27017 isLocalHost:0"
4.    }

启动分片1(192.168.32.13:27019)：

1.    ./bin/mongod --dbpath data/shard3/ --logpath log/shard3.log  --fork --port 27019

启动分片 2(192.168.32.13:27020)：

1.    ./bin/mongod --dbpath data/shard3/ --logpath log/shard3.log  --fork --port 27020

启动分片 3(192.168.71.43:27017)：

1.    numactl --interleave=all ./bin/mongod --dbpath data/shard1/ --logpath log/shard1.log  --fork --port 27017

启动分片 4(192.168.71.43:27018)：

1.    numactl --interleave=all ./bin/mongod --dbpath data/shard2/ --logpath log/shard2.log  --fork --port 27018

说明：关于这里为什么加 numactl –interleave=all，后面有详细说明。

Note: 在生产环境可以将启动的配置信息写入文件，然后启动的时候指定配置文件，这样便于管理：

1.    vi conf/config.conf
2.    bpath=data/config/
3.    logpath=log/config.log
4.    port=10000
5.    fork=true

6.    ./bin/mongod -f conf/config.conf

添加分片1(192.168.32.13:27019)：

1.    ./bin/mongo --port 20000
2.    mongos> use admin
3.    switched to db admin
4.    mongos> db.runCommand({addshard:"192.168.32.13:27019",allowLocal:true })

添加分片 3(192.168.71.43:27017)：

1.    mongos> db.runCommand({addshard:"192.168.71.43:27017" })

类似的添加分片 2,4。

分片添加成功返回类似下面的信息(当前 mongodb 版本为 2.4.1)：

1.    {"shardAdded" : "shard0000", "ok" : 1 }

删除分片：如果要删除分片的话可以 removeshard 命令：

1.    mongos> use admin
2.    switched to db admin
3.    mongos> db.runCommand({"removeshard":"192.168.32.13:27020"})
4.    {5.            "msg" : "draining started successfully",
6.            "state" : "started",
7.            "shard" : "shard0001",
8.            "note" : "you need to drop or movePrimary these databases",
9.            "dbsToMove" : [10.                    "test3"
11.            ],
12.            "ok" : 1
13.    }

移除分片需要一个过程，MongoDB 会把移除的片上的数据（块）挪到其他片上，移动中会显示进度：

1.    mongos> db.runCommand({"removeshard":"192.168.32.13:27020"})
2.    {3.            "msg" : "draining ongoing",
4.            "state" : "ongoing",
5.            "remaining" : {6.                    "chunks" : NumberLong(0),
7.                    "dbs" : NumberLong(1)
8.            },
9.            "note" : "you need to drop or movePrimary these databases",
10.            "dbsToMove" : [11.                    "test3"
12.            ],
13.            "ok" : 1
14.    }

1.    mongos> use config
2.    switched to db config
3.    mongos> db.databases.find()
4.    {"_id" : "test3", "partitioned" : false, "primary" : "shard0001" }

这里 shard0001 就是 192.168.32.13:27020，下面通过 moveprimary 命令移除 test3:

1.    mongos> use admin 
2.    switched to db admin
3.    mongos> db.runCommand({"moveprimary":"test3","to":"192.168.32.13:27019"})
4.    {"primary " : "shard0000:192.168.32.13:27019", "ok" : 1 }

这时再查看 config.databases 会发现 test3 的大本营变成了 shard0000(192.168.32.13:27019)

这时再执行 removeshard 命令则能成功移除分片了：

1.    mongos> db.runCommand({"removeshard":"192.168.32.13:27020"})
2.    {3.            "msg" : "removeshard completed successfully",
4.            "state" : "completed",
5.            "shard" : "shard0001",
6.            "ok" : 1
7.    }

进入 mongos 进程 config 库可以看到目前分片的情况：

1.    ./bin/mongo –port 20000
2.    use config
3.    db.shards.find()
1.    mongos> use config
2.    switched to db config
3.    mongos> db.shards.find()
4.    {"_id" : "shard0000", "host" : "192.168.32.13:27019" }
5.    {"_id" : "shard0001", "host" : "192.168.71.43:27017" }
6.    {"_id" : "shard0002", "host" : "192.168.71.43:27018" }

1.    mongos> db.shards.find()
2.    error: {3.            "$err" : "could not initialize sharding on connection 192.168.32.13:10000 :: caused by :: mongos specified a different config database string : stored : localhost:10000 vs given : 192.168.32.13:10000",
4.            "code" : 15907
5.    }

解决方法是重新启动 config 进程。

查看分片后的数据库：

1.    ./bin/mongo –port 20000
2.    use test
3.    db.test.user.insert({“test”:“test”})
4.    ……
5.    use config
6.    db.databases.find()
7.    {"_id" : "admin", "partitioned" : false, "primary" : "config" }
8.    {"_id" : "test", "partitioned" : false, "primary" : "shard0000" }
9.    {"_id" : "test2", "partitioned" : false, "primary" : "shard0000" }
10.    {"_id" : "test3", "partitioned" : false, "primary" : "shard0001" }