RSS
 

Archive for the ‘System Management’ Category

WAN traffic is not a simple matter of Optimisation.

11 Apr

Recently we have been working with a client to try and resolve some performance issues at one of their satellite offices.
The office is on the end of a 2MB line with around 40 staff using the standard sort of corporate applications (email, internet, MS Office, SharePoint, SAP, Intranet, MSSQL, Internal web apps, etc).

As you might have guessed, the staff have been suffering from performance issues and our client has been exploring solutions to improve the WAN performance. When we became involved they had just trialled the popular RiverBed WAFS solution… but it made the situation worse!

Why did it make the situation worse?

The problem with WAN traffic is often not about the simple amount of data going across the WAN, more often it is about the response times and “feel” of using the link within specific user applications. In our client’s case, the RiverBed did a good job, it compressed the traffic traversing the WAN quite well. Reporting a 50%+ “increase” in bandwidth, so in effect creating a 3MB virtual pipe where the actual link was only 2mb.

But the users complained MORE bitterly than before… why?

The reason is that that user performance is about more than raw data compression or TCP/IP optimisation. It is about how long it takes to open or close a file. How fast an application refreshes, how long it takes to load a SharePoint page. None of these examples were noticeably improved by the RiverBed’s configuration. And in fact made it worse.

It made it worse because it added latency as the RiverBed devices at either end processed the data traffic. It also provided a faster conduit for “chatty” protocols and for “greedy” applications. And based on the experiences of the staff it didn’t prioritise the traffic that mattered to them.

How to improve the situation?

What we would always suggest to a client is that they start by benchmarking the performance users perceive in both subjective and objective terms BEFORE any changes are made. You’ll want to benchmark across a variety of days and times. So first thing, last thing and say at lunchtime. On a Monday, a Friday and the last day of the Month, etc.

You will want to discover what traffic matters to your users and to your business (they are often different). And have your benchmarks ready to provide metrics on the level of performance of the network.

Then… and only then… do you want to make some changes. Once you have made one change, you will be well served to then re benchmark performance of the WAN, to see how the changes have changed the situation. Sometimes (as in our clients case), a supposed performance improvement change actually causes a decrease in performance. Often because solution vendors and their resellers, do not bother to benchmark with you; so have no idea what their product will do in your environment.

So what to change?

Here are some tips that we have used on many organisations networks and depending on their environment have worked well.

1. Try traffic shaping before traffic optimisation and compression.
Your WAN is host to a wide variety of applications, from recreational web surfing through to email and onto database traffic and  sales systems. Some applications are “greedy” and will use all the WAN bandwidth they can and this is a really common cause of performance issues, a web download or upload can quickly kill the SAP traffic leaving your users tearing out their hair!Traffic shaping can limit how much bandwidth each application is allowed to use, prevently one traffic type from effecting another. You can also prioritise one traffic type over another, so that for example your internal Intranet can be given priority over an external website(s), or audio streaming etc.

2. Limit don’t block.
The temptation is to block bad traffic (like Audio and Video streaming, IM traffic, Skype, etc). But this can often be the cause of bigger problems. Simply severely restricting (say down to 25 kb/sec or lower) the users and applications will not detect that traffic is being blocked, just that it does not work well. This is more likely to prevent them using it than if you block the traffic and the application or user finds a way around the block and suddenly the traffic is unlimited again.

3. Protect the important before restricting the bad.
Think about what matters to your users and your business and protect that traffic first. Simply doing this is often enough, forcing less important and bad traffic/applications to fight for bandwidth below you important traffic will often give you all the improvements you need.

4. Think about what users experience.
Our most common suggestion is to severely limit the bandwidth that email is allowed to use. Which often comes up against resistance as email is often “mission critical”. The confusion here is about if the is important in terms of transport across your WAN. If an email takes 30 seconds or 90 seconds to reach a recipient is not noticeable to users as they are normally only told that it has arrived after the transport has occured. Also most email clients only poll for email on a 5 minute or more cycle. So limiting email traffic (so it takes longer) normally has no effect on user perception but can have a huge impact on the performance of the applications that need live updates (web apps, SharePoint etc).

Above are four tips that we hope you with your thinking about the problems you will encounter working over a WAN. Again the most important suggestion we have is to benchmark before, during and after implementing changes and solutions. You need subjective and objective reporting on performance to know if the time, energy and money you are investing is actually helping improve your situation. If your vendor(s) are not including some benchmarking, warning bells should be sounding in your head.

If you’d like some assistance in tuning your WAN, please don’t hesitate to contact us!

 

The Issue with Issues.

16 Feb

Recently enVirtua has been  providing issue resolution management services on a large IT project.

The problem with large projects is that there are always issues that arise that push back deadlines. As deadlines get pushed back, revenue can be affected as your project fails to deliver on time. This is bad news for any business (or project), so ensuring you have planned for problems and have processes in place to fault find, design solutions and implement the solutions is vital! Key also is having processes in place to document all the issues so that you can identify and understand the root cause of the problems and ensure that these sorts of issues don’t happen again in the future.

Managing all this is difficult and needs quite a bit of effort, which often is not planned for or resourced fully.

On our current project, the client appreciated that they needed to bring in some extra resource to deal with the growing pile of issues and also to help put processes in place to lessen the resource required to deliver their product on time. There is always a balance betrween “fighting fires” and fixing the underlying causes of the issues.

One of the big issues every business faces is that the “fire fighting” normally takes precedence over the underlying problems. Which is not the way it should be, fixing one underlying problem might decrease your “fire fighting” by 25%.

And example is configuration management, a common problem in many organisations. I have lost count of the number of corporate networks I have worked on where the documentation showing how things work is poor or non-existant. It will almost always be out of date (if it exists). This does not cause a problem… until there is a problem. Without an accurate map of the system, it is very VERY difficult for people to try and fix complicate root cause issues.

Creating a documentation process is often a good starting point for any effort to solve issues.

In a previous organisation, we started small with a simple Wiki used only by two engineers. Later some of the management started using the wiki and then followed all the technical people. That initial effort to document via the wiki resulted in a combined effort that delivered a detailed map of the companies infrastructure.

In another organisation, documentation of the infrastructure was not the issue, rather documenting what issues were occuring. This we resolved by implementing a simple helpdesk application, that allowed the IT people to keep track of problems they were solving day to day. Everything from pulling paper out of printers to finding and fixing bugs in software started being recorded. The upshot of this was the IT guys were able to identify trends and focus their “sparetime” on trying to improve areas that were causing underlying issues. It also gave management visibility as to the amount of work being done by the IT team and which departments productivity was being affected the most by problems.

So having ways and means to document structure and issues is key to supporting any technical solution. The final piece of the puzzle is ensuring that you have enough resource (people) and the right resources to solve the fire fighting and underlying issues you will encounter. This is partly an HR problem and partly a process problem. You need useful people, but you also need to ensure that these people have the right tools and the right organisational structure and support to do their job well and in the best way for your business.

Many years ago, I worked on a large support team within a multi-national company. The IT support team at one stage was not balanced properly. By this I mean we had too many people in the team digging deeply into the underlying causes of problems rather than applying quick fixes (band aid solutions) to keep staff working. This happens all the time if management does not carefully select IT people. You need a mix of persoinality and competencies, you need some people who will sit and read logfiles all day to solve one issue and some people who will reboot a server in a hurry as they know it will get the system working ASAP, though not solve the underlying issue.

The other key element in all this is effective management of issue. You need to know when to dig deeper into a problem and when to solve the symptom and move on. In enVirtua’s current angagement, this is primarily what we are getting paid to do. It is vital to the project that issues be resolved quickly so the project milestones are met, but equally root causes need to be identified and solutions to these issues found and applied proactively and restrospectively. This current engagement requires that enVirtua does some quick fault finding on issues that come in via two seperate helpdesk systems and do triage, determining how best to resolve the issue. Also, issues need to be clustered and root cause analysis work is part of what we are providing along with finding the resouces best suited to making improvements to solve issues and maintaining a gentle management role over suppliers to and staff of our clients; despite our lack of real authority over those people.

It has been a challenging project, which has sharpened our senses when it comes to support structures and how they work well. Of course it has meant looking at enVirtua’s internal support structures and identifying some holes in what we do ourselves. Over the next few months we are making changes to improve our customer care and issue resolution processes as a result of working for this client.

To summarise, the “Issue with Issues” is that too often issue management is not something that is carefully planned and managed. It is left till last or not planned for at all as people (wrongly) don’t expect many/any issues to occur. If we had one thing to say to people it would be that issues always occur, always! You need to plan on problems happening and not neglect to have processes in place for when things go wrong. It will happen one day and the amount of preparation you have put into place will determine how much your companies profits are affected.