Troubleshooting
Clearing the schema mutation lock
Tip
The schema mutation lock is no longer used from Atlas 0.108.0 onwards. These steps are maintained here for reference for users of AtlasDB on older versions.
In versions of AtlasDB prior to 0.108.0, we hold a schema mutation lock while performing schema mutations (e.g. creating or dropping tables) in Cassandra. If an AtlasDB client dies while holding the lock, the lock must be manually cleared or clients will not be able to perform schema mutations. Prior to AtlasDB 0.19, we would always grab the schema mutation lock on startup, and thus would fail to start until the lock had been cleared.
You will see one or both of the following exceptions when the schema mutation lock has been dropped:
// TimeoutException
We have timed out waiting on the current schema mutation lock holder. We have
tried to grab the lock for %d milliseconds unsuccessfully. This indicates
that the current lock holder has died without releasing the lock and will
require manual intervention. Shut down all AtlasDB clients operating on the
%s keyspace and then run the clean-cass-locks-state cli command.
// RuntimeException
The current lock holder has failed to update its heartbeat. We suspect that this
might be due to a node crashing while holding the schema mutation lock. If this
is indeed the case, run the clean-cass-locks-state cli command.
Clear with CLI
One runs the CLIs as a separate distribution. These distributions are published on Maven Central - please make sure to use the corresponding version of the CLIs with your service. The command can then be invoked as
./service/bin/atlasdb-cli --offline -c var/conf/<service>.yml clean-cass-locks-state
Clear with CQL
If you prefer to clear the lock with the Cassandra Query Language (CQL), then you can run commands similar to the below.
Note that on more recent versions of AtlasDB, the _locks
table will have a hexadecimal string suffix; however, the
truncation process remains similar.
cd my/cassandra/service/dir
./bin/cqlsh
cqlsh> use "myKeyspace";
cqlsh:myKeyspace> describe tables;
"myKeyspace__table1"
"_locks"
"myKeyspace__table2"
"_timestamp"
"myKeyspace__table3"
"_transactions"
sweep__priority
"_scrub"
"_punch"
"_metadata"
sweep__progress
cqlsh:myKeyspace> select * from "_locks";
key | column1 | column2 | value
----------------------------------+----------------------------+---------+--------------------
0x476c6f62616c2044444c206c6f636b | 0x69645f776974685f6c6f636b | -1 | 0x11884a8da443f45a
(1 rows)
cqlsh:myKeyspace> truncate table "_locks";
cqlsh:myKeyspace> select * from "_locks";
key | column1 | column2 | value
-----+---------+---------+-------
(0 rows)
cqlsh:myKeyspace>
You should now be able to successfully start your services.