问题
It's probably the tenth time I'm implementing something like this, and I've never been 100% happy about solutions I came up with.
The reason using mysql table instead of a "proper" messaging system is attractive is primarily because most application already use some relational database for other stuff (which tends to be mysql for most of the stuff I've been doing), while very few applications use a messaging system. Also - relational databases have very strong ACID properties, while messaging systems often don't.
The first idea is to use:
create table jobs( id auto_increment not null primary key, message text not null, process_id varbinary(255) null default null, key jobs_key(process_id) );
And then enqueue looks like this:
insert into jobs(message) values('blah blah');
And dequeue looks like this:
begin; select * from jobs where process_id is null order by id asc limit 1; update jobs set process_id = ? where id = ?; -- whatever i just got commit; -- return (id, message) to application, cleanup after done
Table and enqueue look nice, but dequeue kinda bothers me. How likely is it to rollback? Or to get blocked? What keys I should use to make it O(1)-ish?
Or is there any better solution that what I'm doing?
回答1:
I've built a few message queuing systems and I'm not certain what type of message you're referring to, but in the case of the dequeuing (is that a word?) I've done the same thing you've done. Your method looks simple, clean and solid. Not that my work is the best, but it's proven very effective for large-monitoring for many sites. (error logging, mass email marketing campaigns, social networking notices)
My vote: no worries!
回答2:
Your dequeue could be more concise. Rather than relying on the transaction rollback, you could do it in one atomic statement without an explicit transaction:
UPDATE jobs SET process_id = ? WHERE process_id IS NULL ORDER BY ID ASC LIMIT 1;
Then you can pull jobs with (brackets [] mean optional, depending on your particulars):
SELECT * FROM jobs WHERE process_id = ? [ORDER BY ID LIMIT 1];
回答3:
Brian Aker talked about a queue engine a while ago. There's been talk about a SELECT table FROM DELETE
syntax, too.
If you're not worried about throughput, you can always use SELECT GET_LOCK() as a mutex. For example:
SELECT GET_LOCK('READQUEUE');
SELECT * FROM jobs;
DELETE FROM JOBS WHERE ID = ?;
SELECT RELEASE_LOCK('READQUEUE');
And if you want to get really fancy, wrap it in a stored procedure.
回答4:
Here is a solution I used, working without the process_id of the current thread, or locking the table.
SELECT * from jobs ORDER BY ID ASC LIMIT 0,1;
Get the result in a $row array, and execute:
DELETE from jobs WHERE ID=$row['ID'];
Then get the affected rows(mysql_affected_rows). If there are affected rows, process the job in the $row array. If there are 0 affected rows, it means some other process is already processing the selected job. Repeat the above steps until there are no rows.
I've tested this with a 'jobs' table having 100k rows, and spawning 20 concurrent processes that do the above. No race conditions happened. You can modify the above queries to update a row with a processing flag, and delete the row after you actually processed it:
while(time()-$startTime<$timeout)
{
SELECT * from jobs WHERE processing is NULL ORDER BY ID ASC LIMIT 0,1;
if (count($row)==0) break;
UPDATE jobs set processing=1 WHERE ID=$row['ID'];
if (mysql_affected_rows==0) continue;
//process your job here
DELETE from jobs WHERE ID=$row['ID'];
}
Needless to say, you should use a proper message queue (ActiveMQ, RabbitMQ, etc) for this kind of work. We had to resort to this solution though, as our host regularly breaks things when updating software, so the less stuff to break the better.
回答5:
I would suggest using Quartz.NET
It has providers for SQL Server, Oracle, MySql, SQLite and Firebird.
回答6:
This thread has design information that should be mappable.
To quote:
Here's what I've used successfully in the past:
MsgQueue table schema
MsgId identity -- NOT NULL
MsgTypeCode varchar(20) -- NOT NULL
SourceCode varchar(20) -- process inserting the message -- NULLable
State char(1) -- 'N'ew if queued, 'A'(ctive) if processing, 'C'ompleted, default 'N' -- NOT NULL
CreateTime datetime -- default GETDATE() -- NOT NULL
Msg varchar(255) -- NULLable
Your message types are what you'd expect - messages that conform to a contract between the process(es) inserting and the process(es) reading, structured with XML or your other choice of representation (JSON would be handy in some cases, for instance).
Then 0-to-n processes can be inserting, and 0-to-n processes can be reading and processing the messages, Each reading process typically handles a single message type. Multiple instances of a process type can be running for load-balancing.
The reader pulls one message and changes the state to "A"ctive while it works on it. When it's done it changes the state to "C"omplete. It can delete the message or not depending on whether you want to keep the audit trail. Messages of State = 'N' are pulled in MsgType/Timestamp order, so there's an index on MsgType + State + CreateTime.
Variations:
State for "E"rror.
Column for Reader process code.
Timestamps for state transitions.
This has provided a nice, scalable, visible, simple mechanism for doing a number of things like you are describing. If you have a basic understanding of databases, it's pretty foolproof and extensible. There's never been an issue with locks roll-backs etc. because of the atomic state transition transactions.
回答7:
You can have an intermediate table to maintain the offset for the queue.
create table scan(
scan_id int primary key,
offset_id int
);
You might have multiple scans going on as well, hence one offset per scan. Initialise the offset_id = 0 at the start of the scan.
begin;
select * from jobs where order by id where id > (select offset_id from scan where scan_id = 0) asc limit 1;
update scan set offset_id = ? where scan_id = ?; -- whatever i just got
commit;
All you need to do is just to maintain the last offset. This would also save you significant space (process_id per record). Hope this sounds logical.
来源:https://stackoverflow.com/questions/423111/whats-the-best-way-of-implementing-a-messaging-queue-table-in-mysql