Sunday, April 22, 2012

Designing next generation gaming servers.

A list of high performance network libraries.
Information about the Object-oriented Graphics Rendering Engine
Using epoll() for Asynchronous Network Programming
The C10K problem

epoll() is used with Native Posix Thread Library for Linux in order to create a listener thread that listens for new connections, a pool of I/O worker threads that process read and write requests, and a pool of threads to handle initial parsing of incoming data requests. These requests are then serialized and sent as a single stream to the gaming server.

Effective use of threads will allow the gaming server to scale across the CPUs in box. In order to scale up to millions of clients you will have to support a lot of front end server boxes to accept and send client connections and data.

The back end game servers will only be able to support so many users in particular part of the gaming world at one time. If you are running a quest you could located the quest on a set of servers and only allow one group at a time onto any particular quest server.

~$ apropos epoll
epoll (4) - I/O event notification facility
epoll (7) - I/O event notification facility
epoll_create (2) - open an epoll file descriptor
epoll_create1 (2) - open an epoll file descriptor
epoll_ctl (2) - control interface for an epoll descriptor
epoll_pwait (2) - wait for an I/O event on an epoll file descriptor
epoll_wait (2) - wait for an I/O event on an epoll file descriptor

I would divide up the server gaming and the data requests for discrete files like meshes and textures onto different servers so that your gaming back end doesn't get bogged down serving up large graphical elements. You might want to put these files on a service such as akamai if it is very important to rapidly send the data to many people at once.

On the client side you want the graphics, networking, and file IO to be in different threads as well so that blocking on one aspect doesn't effect the other. The gaming files can come on a distribution disk, but this should just be preloading a cache of files that the server can update on an as needed basis later. The client should cache any meshes or texture files it downloads from the server. It should be possible to download a small executable that then downloads and caches all the files it needs locally.

The client should also be very error tolerant of missing meshes and textures, dropping back to defaults until it can find or generate needed meshes and textures.

Procedural textures and meshes can reduce network load by a huge amount. TheProdukkt is a company that is demoing amazing 3d environments in Kilobytes, not hundreds of megabytes. The cost is processing that needs to be done on the client side to generate the textures and meshes for display. However if these finished elements are then cached on the filesystem on the client side, then that processing is a one time deal. You may want to give the user a choice as to how large a cache to use locally and then only keep the most common elements, generating other things as needed.


With a big game that has a million users you really have a massive super computer and much of the processing of data could be done on client machines with a low priority.

Your infrastructure is only as strong as your weakest link.

The first point of failure is Domain Name Service (DNS) look up. You can't hardcode the IP addresses to your servers into your game, so they have to do a lookup when the game first starts in order to find your servers.

If you have millions of people waking up and loading up your game on the east coast, then you cannot afford for them to take 5 or 10 minutes to all look up your DNS entry. You need the DNS servers to be able to survive a sustained DoS attack from some disgruntled user with a bot army.

So you need a distributed, high performance, high availability DNS service to provide a scalable DNS service for your customers. The DNS service has to give out an IP address that is closest to that user geographically.

You need servers in each geographic region in order to keep the round trip ping times to your servers at acceptable levels. The speed of light is the limiting factor here. If someone is 2000 km away from your server, then you have a lag time that is worse than 20ms, and no better than 10ms even if you had a direct connection with fiber and no routing.
  • The speed of light in vacuum is 300 x 10^6 m/s.
  • The speed of light in fibre is roughly 66% of the speed of light in vacuum.
  • The speed of light in fibre is 300 x 10^6 m/s * 0.66 = 200 x 10^6 m/s.
  • The one-way delay to Boston is 2000 km / 200 x 10^6 m/s = 10ms.
  • The current ping time over 2000km on today's Internet is about 25ms:

Now you have your servers set up on the East coast, the West coast, and in St Louis. You need to make sure that each server has at least 2 high speed Internet connections. And either connection alone should handle the network load at maximum volume. You need this so that your servers do not have a single point of failure. Preferably you want two different companies providing service.

These incoming routers are a choke point and any Denial of Service attacks will have to be mediated here. Allowing your game world to survive a DoS attack is an art. Here is a list of resources.

Behind each router providing network service you want to have a load balancer. Load balancers are like very smart routers that know how your internal network is set up and can adjust if you add or remove servers from inside your network.

Behind the load balancers you need a pool of front end servers to handle the client connections. Ideally you power these on and off on an as needed basis in order to service your client connections.

These front end servers are just smart enough to accept new connections and at first route to a pool of authentication servers to make sure the connection is who it claims to be, then route to a game server once the client is authorized. If you are using encryption on the network connections, then these front end boxes can have hardware decryption devices to allow them to handle an order of magnitude more connections. Here is one example of that.

There are many ways to organize the game servers. I am going to save that for another article.

No comments:

Post a Comment