Skip to main content

SolrCloud: how to add a shard to an existing collection

If a shard it's becoming too big you cannot simply add a new shard to a collection, but the correct procedure is to split the shard in two parts. The original shard remains the same and Solr creates two new shards each of which contains a part of the documents of the original shard.
The two new shards will start to receive the requests directed to the original shard, and the latter must be deactivated manually. If needed you can move the new shards to other cluster nodes.

Split the shard in two parts


The command to split a shard into two parts is:

curl 'http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=examplecoll&shard=shard1&async=true'

Here "shard1" is the id of the shard to be split and "examplecoll" is the name of the collection to which the shard belongs.
Note the async parameter: it's better to execute this command in an asynchronous way because it can last for a long time depending on the number of documents contained in the shard. To check the status of the task there's the command:

curl 'http://localhost:8983/solr/admin/collections?action=REQUESTSTATUS
      &requestid=request-id'

where "request-id" is the id of the asynchronous request returned by the split command.

When the shard has been split you should unload the old shard on every replica where it's hosted (here I suppose that there are only two replicas):

curl 'http://localhost:7574/solr/admin/cores?action=UNLOAD
      &core=examplecoll_shard1_replica1'

curl 'http://localhost:7500/solr/admin/cores?action=UNLOAD
      &core=examplecoll_shard1_replica2'

The UNLOAD command doesn't remove the data from disk: it should be removed by hand.

Move a shard to another node


When you split a shard you often need also to move part of the documents (and workload) on some other cluster nodes. Up to now there's no automated way to accomplish the task, so you have to execute the commands by hand.

To move a shard replica on another node there are two steps:

1) create a new replica of the shard on the destination node:

curl 'http://localhost:9900/solr/admin/cores?action=CREATE
      &name=examplecoll_shard1_0_replica1&collection=examplecoll&shard=shard1_0'

Here the shard to be moved has id "shard1_0" because it's one of the two pieces into which the "shard1" has been split. The name of the core that will be the new replica of "shard1_0" is convoluted because it follows Solr naming pattern, but you can choose whatever you like.

At this point you must check with a query that the shard has been fully copied to the new replica. Then you can go ahead with the next step:

2) switch off the old replica on the source node:

curl 'http://localhost:7574/solr/admin/cores?action=UNLOAD
      &core=examplecoll_shard1_0_replica1'

Note that the node on which we created the replica listens on the port 9900 and it's different from the node on which the old replica has been deactivated (which listens to the port 7574).

These two steps should be executed for each replica of the shard you are moving.

Comments

Most popular posts

Pairing the Raspberry Pi 3 with your Playstation 3 controller

While setting up the MAME emulator on the Raspberry Pi 3 I decided to experiment with the PS3 controller trying to pair it with the RPi. I found a useful guide here: http://holvin.blogspot.it/2013/11/how-to-setup-raspberry-pi-as-retro.html At section 4 the author describes how to compile sixpair utility, test that everything is working and compile the QtSixA tool. But there are some differences to be noted when working with the Raspberry Pi version 3. First, and most obvious, of all: the RPi 3 has already a Bluetooth device built in, so you don't have to plug a dongle in it, and it's compatible with the PS3 controller. 1. Sixpair The sixpair utility succeeds in coupling with the controller. But to test that it's working I had to test the js1 joystick port, and not the js0 as stated in the guide; so the actual command is: jstest /dev/input/js1 2. QtSixA The QtSixA download link must be changed, because the one shown doesn't compile with the latest...

JSON Web Token Tutorial: An Example in Laravel and AngularJS

With the rising popularity of single page applications, mobile applications, and RESTful API services, the way web developers write back-end code has changed significantly. With technologies like AngularJS and BackboneJS, we are no longer spending much time building markup, instead we are building APIs that our front-end applications consume. Our back-end is more about business logic and data, while presentation logic is moved exclusively to the front-end or mobile applications. These changes have led to new ways of implementing authentication in modern applications. Authentication is one of the most important parts of any web application. For decades, cookies and server-based authentication were the easiest solution. However, handling authentication in modern Mobile and Single Page Applications can be tricky, and demand a better approach. The best known solutions to authentication problems for APIs are the OAuth 2.0 and the JSON Web Token (JWT). What is a JSON Web Token? A JSO...

Software Release Management For Small Teams

Formalizing The Release Management Process (If There’s Any) In some team configurations, especially ones that are found in startups, there are no DevOps, nor infrastructure engineers, to provide support when releasing a new version of the product. Moreover, unlike large bureaucratic companies with defined formal processes, the CTO or Head of Software Development team in a startup is often not aware of the complexities of the software release management process; a few developers in the company may be aware of the complex details of the process, but not everyone. If this knowledge is not documented thoroughly , I believe it could result in confusion. In this article, I’ll try to provide some tips about how to formalize the release process, particularly from the developer’s point of view. Enter The Software Release Checklist You may be familiar with the idea of a checklist for some operations, as per the Checklist Manifesto , a book by Atul Gawande. I believe a formal release proc...