Sunday, April 22, 2012

First try at solving the C10K problem.

I am implementing an epoll() solution as recommended by this page: http://www.kegel.com/c10k.html

The documents and examples are a bit lacking so hopefully by the time I am done with this series I will be showing everyone a good example of a general purpose threaded program that can scale up to multiple tens of thousands of client connections.  I also want to create a general purpose network testing tool that can be pointed at various services and exercise the network connections.

I am running my server and 15 client instances:

my-laptop:~$ netstat -an  | grep 50000 | wc -l
30628

This means that there are 15,000 connections to the server on the same laptop.  All while running at 80% idle.

Here is my code:

https://github.com/BuckRogers1965/DataObj/blob/master/c10KNetwork/c10Knet.c

You have to compile this code with:
gcc c10Knet.c -lpthread -o c10Knet

The client code to run the test is here:

http://www.unix.com/high-level-programming/17611-multi-threaded-server-pthreads-sleep.html
 

It is called test.c.  I modified it a bit for myself, you can too.  I modified the code to connect 1000 times, write to each port 3 times with a second pause between each series of writes, close all the connections, then reconnect, endlessly.  I run 10 to 20 processes with an ampersand to run each instance in the background.

The modified version is here:

https://github.com/BuckRogers1965/DataObj/blob/master/c10KNetwork/test.c

Next step is to make the test clients each run as a thread  in the same module as the server and each make and manage 1,000 connections each.  And using the same epoll techniques to manage the connections and I/O.

--

The test code that I found had a lot of issues and it took me hours to make this work.  There was a problem with the arguments that I was giving to accept.  There was a problem with the flags I was giving to the client connections so that the data was hanging.   For the first day after a couple of minutes the code was actually killing my x window desktop, killing some or all my running programs and even making me log in again once.  It appears that not picking up the data from the clients was somehow causing something to blow out on the Linux laptop I am running and resetting x windows.

--

I almost forgot, I ran into another issue when I got to 1024 connections in a single process.  Linux by default limits the number of connections any process can make to 1024.   I had to follow the directions on the following page to fix the problem:

http://lj4newbies.blogspot.com/2007/04/too-many-open-files.html

I added two new lines

* soft nofile 20000
* hard nofile 20000

in file /etc/security/limits.conf

Here is another page with even better info:

Maximum Number of open files and file descriptors in Linux

I had to reboot for the system to see the new limits.  I am sure there is someway of resetting the values without completely restarting.  Remember, all your threads in a single process share this limit, so set the value according to the sum of what all your threads will need.

You can manage a lot of limits inside your program by using the  function call setrlimit().  The important thing to remember with setrlimit, is that unless you are root you cannot change the hard limit other than irrevocably down for the process.   And you cannot set the soft limit higher than the hard limit.

lsof is a good tool to use to examine what file handles a process has open.

Finding open files with lsof.

And before we are done we are going to be pinning threads to processors under linux to increase the speed of processing. Take charge of processor affinity.

No comments:

Post a Comment