问题
Similar Questions
Similar questions have been asked before, but always have specific qualities to the data which allow a more targeted "split it up and just sort by this part" approach, which does not work when you don't know the structure of the data in the column - or even the column, frankly. In other words, not a generic, "Natural" sort order - something roughly equivalent to SELECT * FROM [parts] ORDER BY [part_category] DESC, [part_number] NATURAL DESC
My Situation
I have a DataView in C# that has a Sort
parameter for specifying the ORDER BY
that would be used by ADO, and a requirement to sort by a column using a 'natural' sort algorithm. I could in theory do just about anything from creating a different column to sort by (based on the column I'd like to have 'sorted naturally') to not sorting in SQL, but rather sorting the result set in code afterwards. I'm looking for the best balance of flexibility, efficiency, preparation effort, and maintainability. I would benefit somewhat from being able to sort such data after retrieval (in C#) or completely within a stored procedure.
In my mind, and according to customer statements so far, 'Natural' sort order will mean treating upper and lower case letters equivalently, and considering the magnitude of a number, rather than the ASCII value of its digits (that is x90
comes before x100
). Jeff Atwood had a pretty decent discussion of this, but it didn't address SQL sorting. That said, these are my thoughts:
- Incorporating the magnitude awareness while also retaining the ability to sort alpha characters ASCII-betically may also come in handy
- Non-alphanumeric characters would probably have to be sorted ASCII-betically regardless
- Decimal point awareness might be more effort than it's worth, since most of the time periods and commas in alphanumeric fields are treated as merely punctuation/separators, and only denote fractional portions when they're representing a float field
My Question
What is a reasonably flexible, reasonably generic, reasonably efficient, approach to implementing a natural sort algorithm for SQL? Weighing the pros and cons, which is the best approach? Is there another option?
- Is there a native SQL way to
ORDER BY [field] NATURAL DESC
or something? - PURE SQL function to create a 'sort equivalent' - Could be used to create some sort of second, possibly indexed, 'sort value' column, or called from a stored procedure, or specified in an 'ORDER BY' clause - but how to write it efficiently? (loops? is there a set based solution at all??)
- CLR SQL Function - usability benefits of pure SQL function, but using procedural language, like C# (algorithm should be no problem, but can it be made to go faster than a pure SQL sort [set based??] implementation?) Also, could be referenced and utilized in C# if efficient enough.
- Avoid SQL Server - since parsing an arbitrary number of numbers amid all sorts of other characters is really best suited for looping or recursion, and T-SQL is not well suited for looping or recursion (though TECHNICALLY supported, All I see is 'DON'T USE LOOPS!!!' and 'CTE's are even worse!!!')
- Some sort of comparator in SQL(??) - doesn't seem like SQL lends itself to that sort of sorting and I don't see a way to specify a comparator to use - so I guess this won't work...
I have values at least as varied as the following:
100s455t
200s400
d399487
S0000005.2
d400400
d99222
cg9876
D550-9-1
CL2009-3-27
f2g099
f2g100
f2g1000
f2g999
cg 8837
99s1000f
These should be sorted as follows:
99s1000f
100s455t
200s400
cg9876
cg 8837
CL2009-3-27
D550-9-1
d99222
d399487
d400400
f2g099
f2g100
f2g999
f2g1000
S0000005.2
回答1:
Create a sort column. That way you can keep all the usual mechanisms in place that you use today to sort. You can index that column for example.
Split the string into parts. You need to pad number parts with zeroes to the maximum possible number length.
For example CL2009-3
would become CL|000002009|-|000000003
.
This way the usual case-insensitive SQL Server collation sort behavior will create the right order.
Doing a natural sort dynamically prevents indexing, requires the entire data set to move into the app for each query and is resource intensive.
Instead, simply update the sort column whenever you update the base column.
回答2:
OK. Here is something that is almost what you are looking for. The only piece it can't deal with is when there are some characters then a space and then numbers (cg 8837 and cg9876). Would be good if in the future you could post the ddl and sample data so we can work with it.
with Something (SomeValue) as(
select '100s455t' union all
select '200s400' union all
select 'd399487' union all
select 'S0000005.2' union all
select 'd400400' union all
select 'd99222' union all
select 'cg9876' union all
select 'D550-9-1' union all
select 'CL2009-3-27' union all
select 'f2g099' union all
select 'f2g100' union all
select 'f2g1000' union all
select 'f2g999' union all
select 'cg 8837' union all
select '99s1000f'
)
select *
from Something
order by
cast(
case when patindex('%[A_Za-z]%', SomeValue) = 1 then '99999999999'
when patindex('%[A_Za-z]%', SomeValue) = 0 then SomeValue
else substring(SomeValue, 1, patindex('%[A_Za-z]%', SomeValue) - 1)
end as bigint),
SomeValue
回答3:
I would recommend to "stay away from the SQL Server". While technically you can implement everything using t-sql or clr function, SQL server remains a single non scalable unit of the infrastructure. Using its CPU resources to do a heavy computing is going to inevitably impact performance of the system in general. And in the end, SQL server will be performing sort using almost exactly the same algorithm that you will use to sort your array on the application side, i.e. looking at each item in array and comparing it to the others until it finds an appropriate position.
Of course I am assuming that, if you try to implement this type of sort on the SQL server side, you will be copying the data into a temporary table before performing the sort, to avoid data locks etc.
来源:https://stackoverflow.com/questions/28372359/natural-sort-for-sql-server