问题
Our service is able to run SELECT
and INSERT
queries without any issues on our local and deployed Cassandra instances.
However, we are having trouble with the following DELETE
query:
DELETE FROM config_by_uuid WHERE uuid = record_uuid;
Our service is able to successfully delete a record on our local instance, but not on our deployed instance. Note that this behavior is constant for both instances, and that that no errors are being reported on our deployed instance.
Notably, when the above query is run on our deployed instance through cqlsh
, it successfully deletes a record. It only fails when run from our service on our deployed instance. Our service and cqlsh
are using the same user to run queries.
At first we suspected that it could be a Cassandra consistency issue, so we tried running the query on cqlsh
both with consistency levels of ONE
and QUORUM
, and for both consistency levels the query succeeded. Note that our service is currently using QUORUM
for all operations.
The reason we are discounting the possibility of this being a code issue is because the service works as intended on our local instance. Our reasoning is that if it were a code issue, it should have failed for both instances, and so the difference must lie somewhere in our Cassandra installations. Both instances are using Cassandra 3.11.X
.
Our keyspace and table details are the same for both instances and are as follows (note that we are only working with a single node for now because we are still in the early stages of development):
CREATE KEYSPACE config WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
CREATE TABLE config.config_by_uuid (
uuid uuid PRIMARY KEY,
config_name text,
config_value text,
service_uuid uuid,
tenant_uuid uuid,
user_uuid uuid
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
We have enabled tracing on our deployed Cassandra and below are the details when running the query through cqlsh
:
system_traces.sessions:
session_id: 25b48ce0-0491-11ea-ace9-5db0758d00f3
client: node_ip
command: QUERY
coordinator: node_ip
duration: 1875
parameters: {'consistency_level': 'ONE', 'page_size': '100', 'query': 'delete from config_by_uuid where uuid = 96ac4699-5199-4a80-9c59-b592d28ea2b7;', 'serial_consistency_level': 'SERIAL'}
request: Execute CQL3 query
started_at: 2019-11-11 14:40:03.758000+0000
system_traces.events:
session_id | event_id | activity | source | source_elapsed | thread
--------------------------------------+--------------------------------------+---------------------------------------------------------------------------------------+--------------+----------------+-----------------------------
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4b3f0-0491-11ea-ace9-5db0758d00f3 | Parsing delete from config_by_uuid where uuid = 96ac4699-5199-4a80-9c59-b592d28ea2b7; | node_ip | 203 | Native-Transport-Requests-1
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4b3f1-0491-11ea-ace9-5db0758d00f3 | Preparing statement | node_ip | 381 | Native-Transport-Requests-1
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4b3f2-0491-11ea-ace9-5db0758d00f3 | Executing single-partition query on roles | node_ip | 1044 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4b3f3-0491-11ea-ace9-5db0758d00f3 | Acquiring sstable references | node_ip | 1080 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db00-0491-11ea-ace9-5db0758d00f3 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | node_ip | 1114 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db01-0491-11ea-ace9-5db0758d00f3 | Key cache hit for sstable 2 | node_ip | 1152 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db02-0491-11ea-ace9-5db0758d00f3 | Merged data from memtables and 1 sstables | node_ip | 1276 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db03-0491-11ea-ace9-5db0758d00f3 | Read 1 live rows and 0 tombstone cells | node_ip | 1307 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db04-0491-11ea-ace9-5db0758d00f3 | Executing single-partition query on roles | node_ip | 1466 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db05-0491-11ea-ace9-5db0758d00f3 | Acquiring sstable references | node_ip | 1484 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db06-0491-11ea-ace9-5db0758d00f3 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | node_ip | 1501 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db07-0491-11ea-ace9-5db0758d00f3 | Key cache hit for sstable 2 | node_ip | 1525 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db08-0491-11ea-ace9-5db0758d00f3 | Merged data from memtables and 1 sstables | node_ip | 1573 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db09-0491-11ea-ace9-5db0758d00f3 | Read 1 live rows and 0 tombstone cells | node_ip | 1593 | ReadStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db0a-0491-11ea-ace9-5db0758d00f3 | Determining replicas for mutation | node_ip | 1743 | Native-Transport-Requests-1
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db0b-0491-11ea-ace9-5db0758d00f3 | Appending to commitlog | node_ip | 1796 | MutationStage-3
25b48ce0-0491-11ea-ace9-5db0758d00f3 | 25b4db0c-0491-11ea-ace9-5db0758d00f3 | Adding to config_by_uuid memtable | node_ip | 1827 | MutationStage-3
Below are the details when running the query from our service:
system_traces.sessions:
session_id: 9ed67270-048f-11ea-ace9-5db0758d00f3
client: service_ip
command: QUERY
coordinator: node_ip
duration: 3247
parameters: {'bound_var_0_uuid': '19e12033-5ad4-4376-8293-315a26370d93', 'consistency_level': 'QUORUM', 'page_size': '5000', 'query': 'DELETE FROM config.config_by_uuid WHERE uuid=? ', 'serial_consistency_level': 'SERIAL'}
request: Execute CQL3 prepared query
started_at: 2019-11-11 14:29:07.991000+0000
system_traces.events:
session_id | event_id | activity | source | source_elapsed | thread
--------------------------------------+--------------------------------------+---------------------------------------------------------------------------+--------------+----------------+-----------------------------
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed67271-048f-11ea-ace9-5db0758d00f3 | Executing single-partition query on roles | node_ip | 178 | ReadStage-2
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed67272-048f-11ea-ace9-5db0758d00f3 | Acquiring sstable references | node_ip | 204 | ReadStage-2
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed67273-048f-11ea-ace9-5db0758d00f3 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | node_ip | 368 | ReadStage-2
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed69980-048f-11ea-ace9-5db0758d00f3 | Key cache hit for sstable 2 | node_ip | 553 | ReadStage-2
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed69981-048f-11ea-ace9-5db0758d00f3 | Merged data from memtables and 1 sstables | node_ip | 922 | ReadStage-2
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed69982-048f-11ea-ace9-5db0758d00f3 | Read 1 live rows and 0 tombstone cells | node_ip | 1193 | ReadStage-2
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6c090-048f-11ea-ace9-5db0758d00f3 | Executing single-partition query on roles | node_ip | 1587 | ReadStage-3
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6c091-048f-11ea-ace9-5db0758d00f3 | Acquiring sstable references | node_ip | 1642 | ReadStage-3
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6c092-048f-11ea-ace9-5db0758d00f3 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | node_ip | 1708 | ReadStage-3
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6c093-048f-11ea-ace9-5db0758d00f3 | Key cache hit for sstable 2 | node_ip | 1750 | ReadStage-3
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6c094-048f-11ea-ace9-5db0758d00f3 | Merged data from memtables and 1 sstables | node_ip | 1845 | ReadStage-3
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6c095-048f-11ea-ace9-5db0758d00f3 | Read 1 live rows and 0 tombstone cells | node_ip | 1888 | ReadStage-3
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6e7a0-048f-11ea-ace9-5db0758d00f3 | Determining replicas for mutation | node_ip | 2660 | Native-Transport-Requests-1
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6e7a1-048f-11ea-ace9-5db0758d00f3 | Appending to commitlog | node_ip | 3028 | MutationStage-2
9ed67270-048f-11ea-ace9-5db0758d00f3 | 9ed6e7a2-048f-11ea-ace9-5db0758d00f3 | Adding to config_by_uuid memtable | node_ip | 3133 | MutationStage-2
Below are the steps we used to install our local Cassandra on Windows 10. Note that no configuration files were changed after installation:
Installed Java 8. Both
java -version
andjavac -version
are working.Installed Python 2.
python --version
is working.Downloaded the latest Cassandra
bin.tar.gz
file from:http://cassandra.apache.org/download/
Extracted the contents of the zip file, renamed it to
cassandra
, and placed it inC:\
.Added
C:\cassandra\bin
to our PATH environment variable.
Below are the steps we used to install our deployed Cassandra on CentOS 8:
Update yum:
yum -y update
Install Java:
yum -y install java java -version
Create the repo file to be used by yum:
nano /etc/yum.repos.d/cassandra.repo --- [cassandra] name=Apache Cassandra baseurl=https://www.apache.org/dist/cassandra/redhat/311x/ gpgcheck=1 repo_gpgcheck=1 gpgkey=https://www.apache.org/dist/cassandra/KEYS
Install Cassandra:
yum -y install cassandra
Create a service file for Cassandra:
nano /etc/systemd/system/cassandra.service --- [Unit] Description=Apache Cassandra After=network.target [Service] PIDFile=/var/run/cassandra/cassandra.pid User=cassandra Group=cassandra ExecStart=/usr/sbin/cassandra -f -p /var/run/cassandra/cassandra.pid Restart=always [Install] WantedBy=multi-user.target
Reload system daemons:
systemctl daemon-reload
Give Cassandra directory permissions:
sudo chown -R cassandra:cassandra /var/lib/cassandra sudo chown -R cassandra:cassandra /var/log/cassandra
Configure system to run Cassandra at startup:
systemctl enable cassandra
Configure the cassandra.yaml file:
nano /etc/cassandra/conf/cassandra.yaml --- (TIP: Use Ctrl+W to search for the settings you want to change.) authenticator: org.apache.cassandra.auth.PasswordAuthenticator authorizer: org.apache.cassandra.auth.CassandraAuthorizer role_manager: CassandraRoleManager roles_validity_in_ms: 0 permissions_validity_in_ms: 0 cluster_name: 'MyCompany Dev' initial_token: (should be commented-out) listen_address: node_ip rpc_address: node_ip endpoint_snitch: GossipingPropertyFileSnitch auto_bootstrap: false (add this at the bottom of the file) seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: "node_ip"
Configure the cassandra-topology.properties file:
nano /etc/cassandra/conf/cassandra-topology.properties --- (NOTE: For "Cassandra Node IP=Data Center:Rack", delete all existing values.) #Cassandra Node IP=Data Center:Rack [Local IP]=SG:Dev # default for unknown nodes default=SG:Dev
Configure the cassandra-rackdc.properties file:
nano /etc/cassandra/conf/cassandra-rackdc.properties --- dc=SG rack=Dev
Run the following commands to clean directories:
rm -rf /var/lib/cassandra/data rm -rf /var/lib/cassandra/commitlog rm -rf /var/lib/cassandra/saved_caches rm -rf /var/lib/cassandra/hints
Start Cassandra:
service cassandra start
Install Python 2:
yum -y install python2 python2 --version
Log in as the default user:
cqlsh -u cassandra -p cassandra node_ip --request-timeout=6000
Create new user:
CREATE ROLE adminuser WITH PASSWORD = 'password' AND SUPERUSER = true AND LOGIN = true; exit;
Log in as new user:
cqlsh -u adminuser -p password node_ip --request-timeout=6000
Disable default user:
ALTER ROLE cassandra WITH PASSWORD = 'cassandra' AND SUPERUSER = false AND LOGIN = false; REVOKE ALL PERMISSIONS ON ALL KEYSPACES FROM cassandra; GRANT ALL PERMISSIONS ON ALL KEYSPACES TO adminuser; exit;
Our service is written in Golang and is using the following third-party libraries to talk to Cassandra:
github.com/gocql/gocql
github.com/scylladb/gocqlx
github.com/scylladb/gocqlx/qb
UPDATE 1: Below are the permissions for the user that our service and cqlsh
are using to run queries (via list all permissions on config.config_by_uuid;
):
role | username | resource | permission
----------+-----------+-------------------------------+------------
adminuser | adminuser | <all keyspaces> | CREATE
adminuser | adminuser | <all keyspaces> | ALTER
adminuser | adminuser | <all keyspaces> | DROP
adminuser | adminuser | <all keyspaces> | SELECT
adminuser | adminuser | <all keyspaces> | MODIFY
adminuser | adminuser | <all keyspaces> | AUTHORIZE
adminuser | adminuser | <keyspace config> | CREATE
adminuser | adminuser | <keyspace config> | ALTER
adminuser | adminuser | <keyspace config> | DROP
adminuser | adminuser | <keyspace config> | SELECT
adminuser | adminuser | <keyspace config> | MODIFY
adminuser | adminuser | <keyspace config> | AUTHORIZE
adminuser | adminuser | <table config.config_by_uuid> | ALTER
adminuser | adminuser | <table config.config_by_uuid> | DROP
adminuser | adminuser | <table config.config_by_uuid> | SELECT
adminuser | adminuser | <table config.config_by_uuid> | MODIFY
adminuser | adminuser | <table config.config_by_uuid> | AUTHORIZE
The Cassandra documentation states that MODIFY
grants the following permissions: INSERT
, DELETE
, UPDATE
, TRUNCATE
. Because adminuser
can insert records without any issues, it seems that our delete issue is not a permission issue.
UPDATE 2: Below are the owner and permissions for key Cassandra directories (via ls -al
):
/etc/cassandra:
total 20
drwxr-xr-x 3 root root 4096 Nov 12 22:18 .
drwxr-xr-x. 103 root root 12288 Nov 12 22:18 ..
lrwxrwxrwx 1 root root 27 Nov 12 22:18 conf -> /etc/alternatives/cassandra
drwxr-xr-x 3 root root 4096 Nov 12 22:18 default.conf
/var/lib/cassandra:
total 24
drwxr-xr-x 6 cassandra cassandra 4096 Nov 12 22:38 .
drwxr-xr-x. 43 root root 4096 Nov 12 22:18 ..
drwxr-xr-x 2 cassandra cassandra 4096 Nov 12 22:38 commitlog
drwxr-xr-x 8 cassandra cassandra 4096 Nov 12 22:40 data
drwxr-xr-x 2 cassandra cassandra 4096 Nov 12 22:38 hints
drwxr-xr-x 2 cassandra cassandra 4096 Nov 12 22:38 saved_caches
/var/log/cassandra:
total 3788
drwxr-xr-x 2 cassandra cassandra 4096 Nov 12 22:19 .
drwxr-xr-x. 11 root root 4096 Nov 12 22:18 ..
-rw-r--r-- 1 cassandra cassandra 2661056 Nov 12 22:41 debug.log
-rw-r--r-- 1 cassandra cassandra 52623 Nov 12 23:11 gc.log.0.current
-rw-r--r-- 1 cassandra cassandra 1141764 Nov 12 22:40 system.log
UPDATE 3: We also suspected this being a tombstone
or compaction
issue, so we tried setting gc_grace_seconds
to 0
and ran the delete query, but it didn't help either.
Running nodetool compact -s config config_by_uuid
with gc_grace_seconds
set to both 0
and the default 864000
didn't help as well.
UPDATE 4: We tried uninstalling and reinstaling Cassandra, but it did not resolve the issue. Below are the steps we used:
Uninstall Cassandra through yum:
yum -y remove cassandra
Remove the following directories:
rm -rf /var/lib/cassandra rm -rf /var/log/cassandra rm -rf /etc/cassandra
Removed any leftover files:
(Note: Do
rm -rf
for the results of the following commands.)find / -name 'cassandra' find / -name '*cassandra*'
e.g.
rm -rf /run/lock/subsys/cassandra rm -rf /tmp/hsperfdata_cassandra rm -rf /etc/rc.d/rc3.d/S80cassandra rm -rf /etc/rc.d/rc2.d/S80cassandra rm -rf /etc/rc.d/rc0.d/K20cassandra rm -rf /etc/rc.d/rc6.d/K20cassandra rm -rf /etc/rc.d/rc5.d/S80cassandra rm -rf /etc/rc.d/rc4.d/S80cassandra rm -rf /etc/rc.d/rc1.d/K20cassandra rm -rf /root/.cassandra rm -rf /var/cache/dnf/cassandra-e96532ac33a46b7e rm -rf /var/cache/dnf/cassandra.solv rm -rf /var/cache/dnf/cassandra-filenames.solvx rm -rf /run/systemd/generator.late/graphical.target.wants/cassandra.service rm -rf /run/systemd/generator.late/multi-user.target.wants/cassandra.service rm -rf /run/systemd/generator.late/cassandra.service
UPDATE 5: This issue was happening on our Server
installation of CentOS, so we tried a Minimal Install
next. Surprisingly, the issue did not occur on the minimal installation. We are currently investigating what the differences might be.
UPDATE 6: We tried creating one more server, this time also choosing a Server
installation of CentOS. Surprisingly, the issue did not occur on this server as well, so the type of CentOS installation also had nothing to do with our issue.
With this, we have confirmed that it was our Cassandra installation that was at fault, although we are not yet sure what we did so wrong that even uninstalling and reinstalling could not resolve the issue on the original server.
Perhaps our uninstall steps above were not thorough enough?
UPDATE 7: Turns out that the reason the new servers didn't have the issue is because the original server was using a customized CentOS ISO instead of a vanilla one. One of our team members is looking into what makes the custom ISO different and I will be updating this issue when they get back to us.
UPDATE 8: As it turns out, the issue is also present in the supposedly vanilla CentOS ISO that we used, and since the customized ISO is based on this, all servers currently have the issue.
However, in order for the issue to occur, the server needs to be rebooted with the reboot
command. This command alternates whether the issue occurs or not (reboot 1, no issue; reboot 2, issue occurs; reboot 3, no issue).
One of our team members is currently investigating if we are using a faulty CentOS ISO. We are also considering the possibility that our ISO is good, but the problem might be on our virtual machine environment.
UPDATE 9: The uncustomized CentOS ISO, CentOS-8-x86_64-1905-dvd1.iso
, was downloaded from centos.org
. We have verified its checksum and have confirmed that the ISO is exactly as it came from the official CentOS website.
With this, we have isolated that the issue is on our virtual machine environment.
We are using vmware ESXi
to create our virtual machine that hosts Cassandra.
Our virtual machine details are as follows:
OS Details:
Compatibility: ESXi 6.7 virtual machine
Guest OS family: Linux
Guest OS version: CentOS 8 (64-bit)
Storage Details:
Type: Standard (choices were `Standard` and `Persistent Memory`)
Datastore Details:
Capacity: 886.75 GB
Free: 294.09 GB
Type: VMFS6
Thin provisioning: Supported
Access: Single
Virtual Machine Settings:
CPU: 1
(choices: 1-32)
Memory: 2048 MB
Hard disk 1: 16 GB
Maximum Size: 294.09 GB
Location: [datastore1] virtual_machine_name
Disk Provisioning: Thin Provisioned
(choices: Thin provisioned; Thick provisioned, lazily zeroed; Thick provisioned, eagerly zeroed)
Shares:
Type: Normal
(choices: Low, Normal, High, Custom)
Value: 1000
Limit - IOPs: Unlimited
Controller location: SCSI controller 0
(choices: IDE controller 0; IDE controller 1; SCSI controller 0; SATA controller 0)
Virtual Device Node unit: SCSI (0:0)
(choices: SCSI (0:0) to (0:64))
Disk mode: Dependent
(choices: Dependent; Independent - persistent; Independent - Non-persistent)
Sharing: None
(Disk sharing is only possible with eagerly zeroed, thick provisioned disks.)
SCSI Controller 0: VMware Paravirtual
(choices: LSI Logic SAS; LSI Logic Parallel; VMware Paravirtual)
SATA Controller 0: (no options)
USB controller 1: USB 2.0
(choices: USB 2.0; USB 3.0)
Network Adapter 1: our_domain
Connect: (checked)
CD/DVD Drive 1: Datastore ISO File (CentOS-8-x86_64-1905-dvd1.iso)
(choices: Host device; Datastore ISO File)
Connect: (checked)
Video Card: Default settings
(choices: Default settings; Specify custom settings)
Generated Summary:
Name: virtual_machine_name
Datastore: datastore1
Guest OS name: CentOS 8 (64-bit)
Compatibility: ESXi 6.7 virtual machine
vCPUs: 1
Memory: 2048 MB
Network adapters: 1
Network adapter 1 network: our_domain
Network adapter 1 type: VMXNET 3
IDE controller 0: IDE 0
IDE controller 1: IDE 1
SCSI controller 0: VMware Paravirtual
SATA controller 0: New SATA controller
Hard disk 1:
Capacity: 16GB
Datastore: [datastore1] virtual_machine_name/
Mode: Dependent
Provisioning: Thin provisioned
Controller: SCSI controller 0 : 0
CD/DVD drive 1:
Backing: [datastore1] _Data/ISO/CentOS-8-x86_64-1905-dvd1.iso
Connected: Yes
USB controller 1: USB 2.0
Many thanks to everyone who took the time to read this long issue!
回答1:
It could be a permission issue. Check the result of following command:
cqlsh> list all permissions on config.config_by_uuid;
This blog from Datastax has some detail about authentication and authorization in Cassandra.
来源:https://stackoverflow.com/questions/58798244/cassandra-delete-works-on-local-but-not-on-deployed