If you want to upgrade from a pre 1.2.X Cassandra version to the new vnodes available in Cassandra 1.2.X+ you will want to run cassandra-shuffle as part of the migration procedure.
This can be… cumbersome.
You should really prepare yourself. Really. Don’t do it without reading. A LOT.
First read and understand this: http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnodes-2
The procedure will take up a lot of everything. Time, network bandwidth, disk space, you name it.
This picture represents roughly 5 hours of running shuffle on a 450GB hdd (and the shuffle process is supposed to run for weeks)
I had a cluster with nodes that had 300GB storage, 40% use. My installation ended up crashing when it reached 100% use, I added an additional volume of 450GB to each of them, they filled too. I have no idea how much storage you really need because I ended up running shuffle for a time, stopping it, running cleanup and compact, restarting it, rinse and repeat.
Here’s some stuff I learned from the experience.
If you can avoid it, please do. If you have the hardware capabilities for duplicating your cluster in the new version, iterating and inserting all the data, please do that. It’s the best option for maximising balance and minimising headaches.
If you can’t, be sure to
- Minimize writing operations: If you can put your cluster in “read-only mode”, do that for as long as shuffle is running. Intensive writing operations really interfere with the process making it all but impossible.
- Decrease your RF: You are going to need a lot of disk space. You can temporarily decrease the RF and up it when you finish (don’t forget to repair after!) to minimise data transfer also.
- Run cleanup: I did not try to run it at the same time as shuffle was active, but keep in mind that when changing token ownership Cassandra streams the data to the new node, but it does not delete it from the previous owner. Cleanup helps to reclaim that space.
- Monitor everything closely: I had to course correct a few times, and stop everything when my disks were filling. This depends on your configuration though. But don’t just run the shuffle and go to sleep, everything can break.
Remember that disabling the shuffle only disables the scheduler, everything that’s running will keep running (streams for example)
Good luck! And remember: Try not to shuffle!