Exporting and Importing Elasticsearch Indicies

In my project I need to run some local tests with data from a production elasticsearch cluster, so I exported data from the production server and imported to my local cluster. This can also be used when backing up and restoring data. Here’re the instructions.

Before you start, check out the official documentation: Snapshot and Restore.

Backing up/exporting data:

  1. Modify your eleasticsearch configuration file (normally elasticsearch.yml) and add a path.repo line, for example:
  2. Make sure this path has the correct permissions so that elasticsearch can read and write.
  3. Create snapshot:
  4. Copy the files in the configured location to your local machine.

Restoring/importing data:

  1. Modify your local elasticsearch configuration similarly like step 1 when backing up.
  2. Place the snapshot files to the repo path.
  3. Close your indices:
  4. Import data:
  5. Reopen your indices:

It is important that your the elasticsearch version on your importing party is compatible with the one exporting data, i.e., in this case your local machine has to be the same version or newer. If not, you need to upgrade elasticsearch first. The official documentation says:

The information stored in a snapshot is not tied to a particular cluster or a cluster name. Therefore it’s possible to restore a snapshot made from one cluster into another cluster. All that is required is registering the repository containing the snapshot in the new cluster and starting the restore process. The new cluster doesn’t have to have the same size or topology. However, the version of the new cluster should be the same or newer than the cluster that was used to create the snapshot.

Solution: dd too slow on Mac OS X

When I was cloning SD cards on Mac OS X using dd', it takes ages to get things done. I was using the following command:

It takes much less time when using /dev/rdisk2 instead of /dev/disk2:

The reason is that rdisks are "raw" thus resulting in a higher R/W speed, according to man hdiutil` [1]:

/dev/rdisk nodes are character-special devices, but are “raw” in the BSD sense and force block-aligned I/O. They are closer to the physical disk than the buffer cache. /dev/disk nodes, on the other hand, are buffered block-special devices and are used primarily by the kernel’s filesystem code.

[1] http://superuser.com/questions/631592/mac-osx-why-is-dev-rdisk-20-times-faster-than-dev-disk

A “normal” sed on Mac

The sed program on Mac is not a standard (GNU) one. To get the normal one, use brew:

After this, alter PATH. For example, add the following line to your ~/.bash_profile:

And now you have a normal sed!


GitHub couples

I’m feeling good today because of theses things:

  1. My mobile phone ran out of battery and the alarm clock didn’t ring this morning, but I still managed to get up just in time and caught the bus at the last minute — and arrived at the company at my usual time.
  2. My manager told me it looks positive to renew my contract and hopefully it will be one and half years. He also says he tries to get it done before the summer vacation, which makes my life a lot easier. Also he says it’s possible to save my holidays till winter. So I’ll be back in China for some time in winter this year.
  3. A very old lady managed to stop and got on the bus even though she waved her hand a bit late to the bus driver. The bus driver was polite and that what I like about Finland: people generally don’t get angry.
  4. Here’s one very funny and geeky picture I saw from xda-developers. In case the link gets invalidated later, the picture reads: “So, where did you two meet?” “Windows users: at the office” “Mac users: at Starbucks” “Linux users: GitHub”.



The code looks as follows.

Multiple Sessions

A linked list of all RUDP sockets is maintained. When rudp_socket() is called, an RUDP socket is created and added to the linked list. An RUDP socket keeps a record of the pees/sessions it talks with. When RUDP receives a packet from an unknown socket address, or when RUDP receives a send packet request to an unknown socket address, a new session is created. And for each session, a linked list of all buffered packets is kept.

Session Establishment and Tearing Down

When rudp_sendto() is called, the protocol first check if there exists a session between the sender and receiver. If not, the protocol will try to setup a session by sending RUDP_SYN messages. And the packet the application wants to send will be buffered in the created session. After an RUDP_ACK message is received, the server side socket start sending out packets. Go back N protocol is used to control the sending process. After the protocol receives a rudp_close() signal, it will first check whether there are still active sessions and packets in the sending buffer. If not, the protocol will send out RUDP_FIN messages and after receiving RUDP_ACKs, the session is torn down.

RUDP Overview

RUDP is a protocol that ensures transfer reliability with UDP. A sliding window protocol (Go back N) is used to realize reliability. Using RUDP, applications can send and receive data packets without worrying about lost packets.

The lines in red signifies state change for RUDP clients (receiver side); while the black lines signifies state change for RUDP servers (sender side).

CrazyBus Launch Script