Archive for April, 2010

Poll, Push or Pull – Which is best….

Friday, April 23rd, 2010

What do we mean by “Poll”, “Push” and “Pull” in terms of data communication

Communications systems either “Push” or “Pull” data, but in some cases, when you need to know if anything is waiting a “Poll” is performed, these techniques each have advantages and disadvantages discussed here.

Polling

Polling is asking whether data is available, or can be sent, for example the pop3 protocol used by main readers. It is very simple and has the advantage that the server being polled need not know anything about the polling client state. The polling client must make periodic requests to the server to determine if data is ready or can be sent.

The disadvantages of “Polling” are that the polling client will not know exactly when it can send or receive data, hence to reduce latency the poll interval may need to be quite frequent increasing the server overhead, especially if it serves a number of clients.

If the polling interval is set to n, and the data transfer time m, then the average delivery/fetch latency is n/2 + m. This can be a limit in many systems.

Computer hardware has historically suffered from issues where some hardware did not use interrupts to indicate data reception, or, like in the PC the old interrupt controller having limited interrupts caused devices to share an interrupt. This in turn increased interrupt latency as the IBM PC had to poll all the hardware devices sharing this interrupt to determine the interrupt source. Hence the evolution of the APIC.

Polling is a solution only to used where servers need not know availability of clients, low latency is unimportant and the host being polled is able to handle the amount of polling requests.

Examples of services using “Poll” are NTP, POP3,

Pushing

Pushing of data is highly efficient and is where a host pushes data to the receiving host. Many protocols use such schemes such as:-

  • Cups (Printed files are pushed to print server)
  • FTP (oddly uses “Pull” as well) (files are ushed to remove server)
  • LPR (Printed files are pushed to print server)
  • SFTP (Secure FTP using ssh wrapper)
  • SNMP (Simple Network Messaging Protocol)
  • SMTP (Internet email delivery – very old – very reliable)
  • Hardware Interrupts

“Pushing” is best used where data is ready for delivery and the client can accept data at any time.

Double Buffering

In some cases, such as writing large amounts of data to a “block” based piece of hardware, double buffering can be used to vastly reduce latency.

Consider a network adapter, which has an output buffer, and interrupts when it has completed a transmit. The OS writes a packet of data to the buffer, then waits for the card to send the data. When the hardware has successfully send the data it interrupts the OS to inform that it is ready to receive more data, but the time taken for the OS to service the interrupt filling the transmit buffer may delay a successive network transmit, adding unwanted delay between packets.

The solution is to utilise two transmit buffers in the hardware device, buffer a and b. Following successful transmission of buffer a, the network adapter will start to tranmit the data in buffer b (if ready) as well as interrupt the computer to instigate a data copy to buffer a. This ensures that the delay following the interrupt and the OS copy of data to buffer a does not add additional latency to the system. The OS software requires little change to cater for this type of system, but the throughput gains are massive. The overall latency of the system is not reduced, but the throughput is increased.

Pulling

Pulling of data is done when a client requires data, and is normally served by fast services. Most client user interfaces use “Pull” type services to achieve the fast response expected by a user. Examples of such services are:-

  • FTP
  • HTTP (driving the internet)

The HTTP protocol is a best use case, where users pull content “On Demand” and has driven the last 20 years of Internet development.

Conclusion

Most data communications systems work best when “Pulling” or “Pushing” data, the use of a “Poll” type system should be avoided unless their is a clear business case.

When designing systems it’s often simpler to implement a scheme which works “Sufficiently Well”, but if designed inappropriately  requires more resources and power. It is often possible to implement systems that utilise very few resources by careful interface designs, and for low power embedded devices this is so important.

High Performance Hosting – Unfinished

Friday, April 9th, 2010

Introduction

Over the years I’ve been asked to produce a number of high reliability high throughout web hosting solutions. Although there are a number of off the shelf solutions that are expensive, the true high availability solutions can be realised using standard linux builds.

In the following article I’ll outline aids to help you build low cost HA solutions using linux.

Load Balancing

For high throughput application it is often not possible to host on a single server. In addition the availability of a single host cannot be guaranteed, so a multi hosted solution is often better.

Developing web applications that can run on multiple servers poses a number of problems, mainly around state maintenance and session management, but these will be covered elsewhere.

Commercial Load Balancers

Commercial hardware load balancers offer a range of features, but you must always consider the “Single Point of Failure” problem, even if the load balancer has dual power supplies etc it can fail, or require replacement. It is always better to buy two complete units that can work in parallel offering higher availability. Upgrading a single unit can be done without fear of loss off service.

By using an external pair of round robin DNS entries it is possible to spread the load across balancers. In the event of a balancer failing you can move the failed IP address to the remaining balancer.

Commercial load balancers are expensive!.

Linux Load Balancers

Using two linux servers and a high availability heartbeat configuration provides a far cheaper solution had has been used by Bigmite Hosting Solutions. Using two linux servers in a HA configuration and running suitable load balancing software a high throughput can be achieved.

A small server can saturate a 1Gb/s network link, leaving the back end application servers to do the work. The choice of load balancing software depends on your requireements such as:-

  1. Session Management
  2. Keep Alives
  3. Monitoring
  4. Latency

If no session management is required (such as a static site) then the kernel based ipvs (Linux Virtual Server) can be used, this is part of the standard linux distribution, and is simple to configure and very reliable.

If sessions need to be maintained to a client then layer x based load balancing is required. Packages such as  HAProxy and BalanceNG (which is next generation of the balance software) offer these features.

Heartbeat

Heatbeat (www.linuxha.org

Requests for comments……

It is NOT finished…. it’s just ready for comments…. I’ve two busy to complete – please comment, and I’ll add your comments…..