Psycopg2 db connection hangs on lost network connection

痞子三分冷 提交于 2021-02-04 17:10:02

问题


Problem description

I'm using psycopg2 to connect to my PostgreSQL database on a remote host. I open a connection and wait for requests, then for each request I run queries on the connection and return data.

But when the network connection is lost after the connection is already open the next db query hangs and I have to kill the program manually.

Details:

  • it hangs for at least 2 hours (I couldn't wait longer)
  • the "network down" situation is actually VPN going down (db host is accesible only by VPN)
  • I can't use asynchronous connection because I need transactions
  • python 2.6
  • psycopg 2.2.1
  • debian linux 6 and 7, 64bit
  • postgresql 8.4 and 9.1

What I want/need

I need some reliable way to detect a failed connection before running a query, so my program won't hang, or a way to make cursor.execute(..) raise an exception on failed connection.

Example:

import psycopg2
import time

conn = psycopg2.connect("host='dbs' dbname='foo' user='joe' password='x'")
time.sleep(10) # I manually turn VPN off during this sleep..
cu = conn.cursor()
cu.execute('SELECT 1') # <- hangs here
print cu.fetchone()
cu.commit()

What have I tried (and what didn't work):

  • setting TCP timeout "globally" - before psycopg2 import, I added:

    import socket
    socket.setdefaulttimeout(10)
    
  • setting TCP timeout on psycopg.connection's socket:

    ..
    conn = psycopg2.connect(...
    s = socket.fromfd(conn.fileno(), socket.AF_INET, socket.SOCK_STREAM)
    s.settimeout(5)
    ..
    
  • enabling keepalive for psycopg.connection's socket:

    ...
    conn = psycopg2.connect(...
    s = socket.fromfd(conn.fileno(), socket.AF_INET, socket.SOCK_STREAM)
    s.settimeout(5)
    s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
    s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 1)
    s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 3)
    s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5)
    ...
    

回答1:


After a long and brutal struggle, I think I fixed this issue by simply doing the strategy others are talking about, but using the psycopg2 connect function itself:


from psycopg2 import connect


conn = connect(
        database=database,
        user=username,
        password=password,
        host=hostname,
        port=port,
        connect_timeout=3,
        # https://www.postgresql.org/docs/9.3/libpq-connect.html
        keepalives=1,
        keepalives_idle=5,
        keepalives_interval=2,
        keepalives_count=2)

I was seeing psycopg2 hang consistently on long-running queries, but now the issue seems to be fully resolved.

Note this may be new functionality, since this question is old.




回答2:


Took a look at the socket timeout and after reading this and this, these settings worked for me

s = socket.fromfd(connection.fileno(),
                  socket.AF_INET, socket.SOCK_STREAM)
# Enable sending of keep-alive messages
s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
# Time the connection needs to remain idle before start sending
# keepalive probes
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, int(ceil(time)))
# Time between individual keepalive probes
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 1)
# The maximum number of keepalive probes should send before dropping
# the connection
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 3)



回答3:


OP's and Gabriel Salla's solutions which configure KEEPALIVE are not complete. This solution only works when the connection is idle (no data sent before network became down) and the network became down.

If some data have already sent over the network that is already down but not yet detected so by the KEEPALIVE feature there will be a hang. This happens because the RTO mechanism is used instead of KEEPALIVE when some data is send.

To set timeout for an RTO you must set TCP_USER_TIMEOUT timeout (in milliseconds) for socket.

The complete solution is (both KEEPALIVE and RTO timeouts configured to 10 seconds):

s = socket.fromfd(conn.fileno(), socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 6)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 2)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 2)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_USER_TIMEOUT, 10000)



回答4:


In order to make sure a connection is still valid, read the property connection.isolation_level. This will raise an OperationalError with pgcode == "57P01" in case the connection is dead.

try: connection.isolation_level except OperationalError as oe: conn = psycopg2.connect(dsn)



来源:https://stackoverflow.com/questions/26741175/psycopg2-db-connection-hangs-on-lost-network-connection

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!