问题
I have developed many .NET / SQL Server applications but I'm suffering from SQL query timeouts that I can't get to the bottom of. I have lots of experience in this area of finding the offending queries and re-indexing / re-writing them. My web app is hosted on AWS using RDS for SQL Server and EC2 for the Web App. We have 100-200 unique users per day and the database is around 15GB with a couple of tables > 1GB.
I see exceptions throughout the day with the message:
'Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.'
The queries that suffer from timeouts are as random as the time the timeouts occur. It doesn't seem to coincide with anything obvious (backups run overnight etc).
I have tried taking each query from the C# app and running it directly in SQL (with the same SET options like Arith Abort) and they all run just fine. Some are slower queries by nature but the slowest one runs in about 2 seconds and has ~400k logical reads. However, I also see queries timeout that run in 15ms and have < 10 logical reads.
The most odd thing I've seen is I've taken a query from the web app and coded it up into a console app which has been running for 24 hours, calling the query once per second. It has not had a single exception / timeout even though I've seen the main system have timeouts for the same query during the time it's been running.
I have recently upgraded the RDS server to an M5 Large and all indexes are rebuilt overnight every day. I have run DBCC FREEPROCCACHE at some point to ensure there are no stale query plans causing the problem.
I feel it's parameter sniffing or my last thought is hardware / network glitches but that really clutching at straws!
The stack trace I get looks like it's mid-query and not during the connection phase.
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()
at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()
at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()
at System.Data.SqlClient.TdsParserStateObject.TryReadByteArray(Byte[] buff, Int32 offset, Int32 len, Int32& totalRead)
at System.Data.SqlClient.TdsParserStateObject.TryReadString(Int32 length, String& value)
at System.Data.SqlClient.TdsParser.TryReadSqlStringValue(SqlBuffer value, Byte type, Int32 length, Encoding encoding, Boolean isPlp, TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParser.TryReadSqlValue(SqlBuffer value, SqlMetaDataPriv md, Int32 length, TdsParserStateObject stateObj, SqlCommandColumnEncryptionSetting columnEncryptionOverride, String columnName)
at System.Data.SqlClient.SqlDataReader.TryReadColumnInternal(Int32 i, Boolean readHeaderOnly)
at System.Data.SqlClient.SqlDataReader.TryReadColumn(Int32 i, Boolean setTimeout, Boolean allowPartiallyReadColumn)
at System.Data.SqlClient.SqlDataReader.GetValueInternal(Int32 i)
at System.Data.SqlClient.SqlDataReader.GetValue(Int32 i)
Any help with some techniques to get to the bottom of this would be much appreciated as it's unsettling and I fear it's suddenly going to get a lot worse.
Thanks
EDIT 1
I have tried to create the same problem locally by running the test app (as above) once every 10ms and running a slow blocking transaction in SSMS at the same time.
Query From App
SELECT TOP 10 *
FROM MyTable
WHERE LastModifiedBy = 'Stu'
Query in SSMS
BEGIN TRAN
UPDATE TOP (10000) MyTable SET LastModifiedBy = 'Me' where LastModifiedBy = 'Me'
WAITFOR DELAY '00:00:35'
COMMIT
When this errors I see what I'd usually expect to see in SQL Profiler where the app query takes exactly 30000ms and I get an exception in the app. However, the useful output from this is the stack trace is different from the one I see in production (above).
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
at System.Data.SqlClient.SqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
at System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader(CommandBehavior behavior)
I'm reading this stack trace as the query never started to execute since it's still trying to read meta-data for the query. However, this contrasts with the stack trace from production that (to my eyes) appears to be in the middle of reading data from columns but has a timeout mid execution.
I've also been reading about .NET 4.6.2 which is the version we're using. I'll upgrade everything to 4.7.2 this evening to rule that out. (Connection to remote SQL server breaks when upgrading web server to .net framework 4.6.1)
回答1:
After a week of stressful investigation it's fixed!! It's been running now for over 2 hours without a single timeout :-)
Turned out to be some kind of bug or mismatch with .NET v4.6.2.
My Configuration was:
- SQL Server 2017 Web Edition on AWS RDS
- .NET v4.6.2
- Dapper v1.50.5
My Changes are:
- Install .NET 4.7.2 on Web Server
- Upgrade Web App and all DLL projects in Visual Studio to use .NET 4.7.2 (ensuring the web.config was updated to
<httpRuntime targetFramework="4.7.2" />
) - Upgrade Dapper via Nuget to the latest v.1.60.0 (I don't think Dapper was at fault, I just upgraded it while doing everything else as it's database related)
These questions helped point me in this direction:
- SqlDataReader.GetValue Hangs
- ADO.Net SQLCommand.ExecuteReader() slows down or hangs
- SqlDataReader hangs on GetValue() method and SNIReadSyncOverAsync
THANK YOU INTERNET - HOW ON EARTH DID I CODE BEFORE YOU CAME ALONG
来源:https://stackoverflow.com/questions/55396666/need-help-diagnosing-sql-server-strange-query-timeouts-from-c-sharp