For one project we use Cassandra as a distributed backend message store (for an email archive, which by it’s nature is always going to grow in size); we choose to use Cassandra as it offered the ability to replicate data over a number of servers – giving us scalability and redundancy. Also, for the project in question, we only ever retrieve an email based on it’s message-id – which happens to be unique (hopefully) and forms a good key 🙂
Anyway, we’ve been using Cassandra 0.6.x for some time, through the Debian packages the project makes available. All was well, until this afternoon when I saw an upgrade to 0.7 was available… now, I knew 0.7 was a long awaited upgrade (as it would allow us to create new keyspaces etc on the fly; apparently…)… and I thought
“No doubt they [the package maintainers] will have either a big warning message, or some automatic migration from 0.6 to 0.7”
I was wrong.
Upon restart of Cassandra (and chown -R cassandra:cassandra /var/lib/cassandra) 0.7, everything appeared fine – except it had no idea where our Keyspace was – but did give an error message like :
“DatabaseDescriptor.java (line 439) Found table data in data directories. Consider using JMX to call org.apache.cassandra.service.StorageService.loadSchemaFromYaml().” in /var/log/cassandra/system.log
Rummaging through the online docs showed that we’d need fire up a “jconsole” thing to fix it. Unfortunately it running on a remote server, so this wasn’t so easy. The easiest solution seemed to be to download Cassandra locally, copy the remote server’s storage-conf.xml file locally and then run the included ‘bin/config-convertor’ –
bin/config-converter conf/storage-conf.xml conf/cassandra.yaml
This YAML file could then be copied to the remote server (/etc/cassandra/cassandra.yaml); then restart the Cassandra service, and you’re ready to connect via jconsole and perform the ‘migration’ to your pre-existing schema…
ssh -L 8080:localhost:8080 email@example.com
<<start jconsole, and point at localhost:8080; no authentication required>>
And click :
MBeans -> org.apache.cassandra.db -> StorageService -> Operations -> loadSchemaFromYaml
Once this was done, we found that running ‘show keyspaces‘ from within the ‘cassandra-cli’ client showed what we needed (our well named ‘Keyspace1’).
Then we just needed to upgrade our pycassa version so the client connected properly, and everything started to work….