Saturday, 21 September 2013

New OrientDB replication engine and plans for the releasing of v. 1.6.

Hi all,
we're glad to announce in the "develop" branch (1.6.0-SNAPSHOT) the new distirbuted configuration engine that doesn't use Hazelcast's Executor Service but rather Queues. This made the entire work easier (we dropped 50% of code of previous implementation) achieving better performances.
This is the new configuration in orientdb-dserver-config.xml:
<handler
    class="com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin">
    <parameters>
        <parameter value="europe0" name="nodeName" />
        <parameter value="true" name="enabled" />
        <parameter value="hazelcast-0.xml" name="configuration.hazelcast" />
        <parameter name="conflict.resolver.impl"
            value="com.orientechnologies.orient.server.distributed.conflict.ODefaultReplicationConflictResolver" />
        <parameter name="configuration.db.default" value="default-distributed-db-config.json" />

        <parameter name="sharding.strategy.round-robin"
            value="com.orientechnologies.orient.server.hazelcast.sharding.strategy.ORoundRobinPartitioninStrategy" />
    </parameters>
</handler>
As you can see we don't use realignment tasks anymore: when a node comes up it get the pending messages from Hazelcast queue. This is good when a server goes up & down in a reasonable time like for a temporary network failure or an upgrade of a server. In case you plan to stop and restart the server after days you'd need to re-deploy the entire database on the server.
This is the default configuration in the new default-distributed-db-config.json file, put it under $ORIENTDB_HOME/config and remove that contained under the database if any. This configuration comes with only one partition where new joining nodes auto register themselves. So this means all the nodes have the entire database (no partitioning). To achieve better performance avoid to use a writeQuorum bigger than 2. I think having 2 servers that synchronously have the data it's very secure. However all the server in the partition are updated, just the client receives the response when writeQuorum is reached. This is the new default configuration file:
{
    "replication": true,
    "clusters": {
        "internal": {
            "replication": false
        },
        "index": {
            "replication": false
        },
        "ODistributedConflict": {
            "replication": false
        },
        "*": {
            "replication": true,
            "writeQuorum": 2,
            "partitioning": {
                "strategy": "round-robin",
                "default": 0,
                "partitions": [
                    [ "<NEW_NODE>" ]
                ]
            }
        }
    }
}

Partitions contains an array of partitions as node names. The keyword "<NEW_NODE>" means that new node that joins the cluster are automatically added in that partition. 

So if you start X nodes the replication works out of the box. Thanks to the partitioning you can shard your database across multiple nodes avoiding to have a symmetric replica. So you can use your 6 servers as before:

[
  [ "node0", "node1", "node2",  "node3", "node4", "node5" ]
]

or in this way:

[
  [ "node0", "node1", "node2" ],
  [ "node3", "node4", "node5" ]
]

It's like RAID. With this configuration you've 2 partitions (0 and 1) with 3 replica each. So the database is spanned across 2 partitions automatically that means each partition owns about half database. By default we provide the "round-robin" strategy but you can already plug your one to better split the graph following your domain.

This is part of the Sharding feature, so consider it as a preview. It will be available in the next release (1.7 or 2.0).

Furthermore we've introduced variable timeouts that change based on the runtime configuration (number of nodes, type of replicated command, etc.)

We've also introduced new settings to tune the replication engine (OGlobalConfiguration class):

DISTRIBUTED_THREAD_QUEUE_SIZE("distributed.threadQueueSize", "Size of the queue for internal thread dispatching", Integer.class,
      1000),

DISTRIBUTED_CRUD_TASK_TIMEOUT("distributed.crudTaskTimeout", "Maximum timeout in milliseconds to wait for CRUD remote tasks",
      Integer.class, 3000),
  DISTRIBUTED_COMMAND_TASK_TIMEOUT("distributed.commandTaskTimeout",
      "Maximum timeout in milliseconds to wait for Command remote tasks", Integer.class, 5000),

DISTRIBUTED_QUEUE_TIMEOUT("distributed.queueTimeout", "Maximum timeout in milliseconds to wait for the response in replication",
      Integer.class, 3000),
  DISTRIBUTED_SYNCH_RESPONSES_TIMEOUT("distributed.synchResponsesTimeout",
      "Maximum timeout in milliseconds to collect all the synchronous responses from replication", Integer.class, 5000),
  DISTRIBUTED_ASYNCH_RESPONSES_TIMEOUT("distributed.asynchResponsesTimeout",
      "Maximum timeout in milliseconds to collect all the asynchronous responses from replication", Integer.class, 15000),

  DISTRIBUTED_PURGE_RESPONSES_TIMER_DELAY("distributed.purgeResponsesTimerDelay",
      "Maximum timeout in milliseconds to collect all the asynchronous responses from replication", Integer.class, 15000);


I suggest to everybody is using the distributed architecture to upgrade to 1.6.0-SNAPSHOT and:
- update the file orientdb-dserver-config.xml and !
- update the file default-distributed-db-config.json
- remove the previous file default-distributed-db-config.json under the databases/<yourdb> directory

We successfully tested the new engine against 10 servers and 200 connected clients with a writeQuorum=2 (2 synchronous copy before to give the OK to the client).

Next week we'll release OrientDB 1.6, so you've last chance to vote your bugs to be in 1.6 roadmap!


Luca Garulli
CEO at Orient Technologies
the Company behind OrientDB
http://about.me/luca.garulli

No comments:

Post a Comment

Note: only a member of this blog may post a comment.