linux - Can't open more than 1023 sockets -


i'm developing code simulating network equipment. need run several thousand simulated "agents", , each needs connect service. problem after opening 1023 connections, connects start time out, , whole thing comes crashing down.

the main code in go, i've written trivial python script reproduces problem.

the 1 thing unusual need set local address on socket when create it. because equipment agents connecting expects apparent ip match should be. achieve this, have configured 10,000 virtual interfaces (eth0:1 eth0:10000). these assigned unique ip addresses in private network.

the python script (only runs 2000 connnects):

import socket  = 0 b in range(10, 30):     d in range(1, 100):         += 1         ip = "1.%d.1.%d" % (b, d)         print("conn %i   %s" % (i, ip))         s = socket.create_connection(("1.6.1.1", 5060), 10, (ip, 5060)) 

if remove last argument socket.create_connection (the source address), can 2000 connections.

the thing different using local address bind must made before connection can set up, output program running under strace looks this:

conn 1023   1.20.1.33 bind(3, {sa_family=af_netlink, pid=0, groups=00000000}, 12) = 0 bind(3, {sa_family=af_inet, sin_port=htons(5060), sin_addr=inet_addr("1.20.1.33")}, 16) = 0 connect(3, {sa_family=af_inet, sin_port=htons(5060), sin_addr=inet_addr("1.6.1.1")}, 16) = -1 einprogress (operation in progress) 

if run without local address, af_inet bind goes away, , works.

so, seems there must kind of limit on number of binds can made. i've waded through sorts of links tcp tuning on linux, , i've tried messing tcp_tw_reuse/recycle , i've reduced fin_timeout, , i've done other things can't remember.

this running on ubuntu linux (11.04, kernel 2.6.38 (64 bit). it's virtual machine on vmware esx cluster.

just before posting this, tried running second instances of python script, additional starting @ 1.30.1.1. first script plowed through 1023 connections, second 1 couldn't first 1 done, indicating problem related large number of virtual interfaces. internal data structure limited? max memory setting somewhere?

can think of limit in linux cause this?

update:

this morning decided try experiment. modified python script use "main" interface ip source ip, , ephemeral ports in range 10000+. script looks this:

import socket  = 0 in range(1, 2000):     print("conn %i" % i)     s = socket.create_connection(("1.6.1.1", 5060), 10, ("1.1.1.30", + 10000)) 

this script works fine, adds belief problem related large number of aliased ip addresses.

what doh moment. watching server, using netstat, , since didn't see large number of connects didn't think there problem. wised , checked /var/log/kernel, in found this:

mar  8 11:03:52 testserver01 kernel: ipv4: neighbour table overflow. 

this lead me posting: http://www.serveradminblog.com/2011/02/neighbour-table-overflow-sysctl-conf-tunning/ explains how increase limit. bumping thresh3 value solved problem.


Comments

Popular posts from this blog

python - ('The SQL contains 0 parameter markers, but 50 parameters were supplied', 'HY000') or TypeError: 'tuple' object is not callable -

objective c - Language Translation API for iPhone -

jasper reports - Fixed header in Excel using JasperReports -