Skip to main content

SolrCloud: how to add a new collection to a running cluster

It's easy to find examples about how to start a SolrCloud cluster with a default collection, but it's not as easy to find examples that tell you how to add a new collection to an already running cluster. Here I'm going to describe the main steps.

Load the collection configuration in ZooKeeper


The collection configuration is nothing else than the "conf" directory of a core when Solr is in a "non Cloud" setup: when Solr is in a Cloud setup the files contained in the "conf" directory cannot be stored on a single node, but they must be loaded in ZooKeeper to be distributed to all cluster nodes.

To accomplish this task there's the zkcli.sh script contained in the Solr distribution in the example/scripts/cloud-scripts directory: it's a simplified version of the ZooKeeper script having the same name.

./zkcli.sh --zkhost localhost:2181 --cmd upconfig --confdir /node1/solr/examplecoll/conf --confname examplecollcfg

Here /node1/solr/examplecoll/conf is the directory with all the core configuration files (for a "non Cloud" Solr), while "examplecollcfg" is the name that the configuration will have in ZooKeeper. At the end of the copy in ZooKeeper you'll find a sort of directory (it's actually a path) named "examplecollcfg" containing the files that have been copied from /node1/solr/examplecoll/conf.
Of course you've to change the address at which ZooKeeper is listening. In my setup this is localhost:2181.

Bind the collection to its configuration


The second step is to bind the (to be created) collection name to its configuration. This command must be executed before creating the collection.

./zkcli.sh --zkhost localhost:2181 --cmd linkconfig --collection examplecoll --confname examplecollcfg

Here "examplecoll" is the name of the collection we're going to create.

You can skip this step if you specify the configuration name in the command you use to create the collection. You can see an example some sections below.

Create the collection on the cluster leader nodes


The next command creates the cores representing the collection shards on the leader nodes.

curl 'http://localhost:8983/solr/admin/collections?action=CREATE&name=examplecoll&numShards=2&replicationFactor=1&maxShardsPerNode=2'

The pattern Solr uses to name the cores is as follows:

<collection>_shard<shardnumber>_replica<replicanumber>

and for the leaders <replicanumber> is 1.

In detail:
  • numShards:  number of shards to split the collection index into;
  • replicationFactor: limits the number of replicas created while creating the collection. This way you can limit the number of cluster nodes (Solr servers) to be used for the collection. For example the collection can use only 10 or 20 nodes in a cluster composed by 100 nodes;
  • maxShardsPerNode: maximum number of shards that can be hosted on a single node.
In the example above replicationFactor is 1: this way the shards are created only on leader nodes and no replicas are associated with them. In the next section we'll see how to manually add replicas to the shards. I prefer to create replicas by hand because I have much more control on which nodes are used and I can choose how to name the single cores.

The complete reference for this and other commands related to the Collections API is here:

https://cwiki.apache.org/confluence/display/solr/Collections+API

Bind the collection to its configuration + Create the collection on the cluster leader nodes


As I said before, it's possible to create a collection and bind it to its configuration in a single step adding the collection.configName parameter to the command used to create a collection.

curl 'http://localhost:8983/solr/admin/collections?action=CREATE&name=examplecoll&numShards=2&replicationFactor=1&maxShardsPerNode=2&collection.configName=examplecollcfg'

Create the collection on the replicas


On the replicas the cores representing the collection shards should be created by hand.

Note: the command to create a collection (and all other commands belonging to the Collections API) can be executed on whatever cluster node you choose. Commands related to single cores or replicas must be executed on the node hosting the core/replica.

Having created a 2 shards collection (numShards=2) we're going to create a replica for each of the shards:

curl 'http://localhost:7500/solr/admin/cores?action=CREATE&name=examplecoll_shard1_replica2&collection=examplecoll&shard=shard1'

curl 'http://localhost:8900/solr/admin/cores?action=CREATE&name=examplecoll_shard2_replica2&collection=examplecoll&shard=shard2'

In my setup the Solr servers are hosted on the same physical machine, so I had to change the ports on which they're listening: the nodes on which we're creating the replicas listen to the ports 8900 and 7500.

The core name can be anything: in this example I used the same pattern used by Solr. Mandatory parameters are the name of the collection to which the cores are bound and the id of the shard which the core will be the replica of.

Some tasks can be accomplished also using the collection API instead of using the core API as I did. This article shows how:

http://heliosearch.org/solrcloud-assigning-nodes-machines


Comments

Most popular posts

Pairing the Raspberry Pi 3 with your Playstation 3 controller

While setting up the MAME emulator on the Raspberry Pi 3 I decided to experiment with the PS3 controller trying to pair it with the RPi. I found a useful guide here: http://holvin.blogspot.it/2013/11/how-to-setup-raspberry-pi-as-retro.html At section 4 the author describes how to compile sixpair utility, test that everything is working and compile the QtSixA tool. But there are some differences to be noted when working with the Raspberry Pi version 3. First, and most obvious, of all: the RPi 3 has already a Bluetooth device built in, so you don't have to plug a dongle in it, and it's compatible with the PS3 controller. 1. Sixpair The sixpair utility succeeds in coupling with the controller. But to test that it's working I had to test the js1 joystick port, and not the js0 as stated in the guide; so the actual command is: jstest /dev/input/js1 2. QtSixA The QtSixA download link must be changed, because the one shown doesn't compile with the latest...

JSON Web Token Tutorial: An Example in Laravel and AngularJS

With the rising popularity of single page applications, mobile applications, and RESTful API services, the way web developers write back-end code has changed significantly. With technologies like AngularJS and BackboneJS, we are no longer spending much time building markup, instead we are building APIs that our front-end applications consume. Our back-end is more about business logic and data, while presentation logic is moved exclusively to the front-end or mobile applications. These changes have led to new ways of implementing authentication in modern applications. Authentication is one of the most important parts of any web application. For decades, cookies and server-based authentication were the easiest solution. However, handling authentication in modern Mobile and Single Page Applications can be tricky, and demand a better approach. The best known solutions to authentication problems for APIs are the OAuth 2.0 and the JSON Web Token (JWT). What is a JSON Web Token? A JSO...

Software Release Management For Small Teams

Formalizing The Release Management Process (If There’s Any) In some team configurations, especially ones that are found in startups, there are no DevOps, nor infrastructure engineers, to provide support when releasing a new version of the product. Moreover, unlike large bureaucratic companies with defined formal processes, the CTO or Head of Software Development team in a startup is often not aware of the complexities of the software release management process; a few developers in the company may be aware of the complex details of the process, but not everyone. If this knowledge is not documented thoroughly , I believe it could result in confusion. In this article, I’ll try to provide some tips about how to formalize the release process, particularly from the developer’s point of view. Enter The Software Release Checklist You may be familiar with the idea of a checklist for some operations, as per the Checklist Manifesto , a book by Atul Gawande. I believe a formal release proc...