Tegola Blog

Report on developing rural broadband

by Peter Buneman
14 Oct 2013

A report has just been released, written and subscribed to by people involved in community broadband, on what is needed to help the substantial fraction of rural Scotland not included in government plans for fast broadband.

Section 36

18 Apr 2013

The European Commission has cleared the disbursement of public funds to telecommunications carriers (read BT) in order to construct network infrastructure in remote areas where it would otherwise not be eonomically feasible. The EU has placed a condition that bears directly on the ability of others to make use of this subsidised build.

Pretty pictures with Povray

by William Waites
24 Mar 2013

This post is about how to make pictures like this. It’s a little technical and assumes some knowledge about working with UNIX command line tools and working with geographical data. If you’re not interested in those sorts of things, then feel free to just enjoy the pretty pictures!

The first thing is the observation that radio waves behave more or less like light, at least to a first approximation. So if we have a decent 3D model of the terrain and we place a light where a radio is, whatever lights up should be pretty much the same as what is visible from that spot, and it therefore should be possible to build a radio link to there.

This is not quite true as there are two important effects that are neglected by this technique. It doesn’t account for Fresnel zones where objects that are slightly out of the direct line of sight can still cause interference. It also doesn’t account fort he curvature of the earth which can matter for longer links.

So how can we do this? High-quality computer generated 3D graphics are often done using a technique called ray tracing. This works by building up a model of a scene, the shapes of objects and what kinds of surfaces they have, then placing light sources at various points, and a camera. Then the path from the camera back to each of the light sources is calculated, reflecting off of the objects, in order to find out what colour each pixel in the resulting image should be. It’s a pretty compute-intensive process, but can give astonishingly realistic results.

A good piece of software for doing this is Povray. It’s a bit weird so far as its licensing goes, but the source code is available and it does the job nicely. If you’re planning on doing this, and want to do the rendering on a computer with multiple CPU cores, it is important to get the newest beta version to take advantage of that sort of modern hardware.

The other important bit is the data. To build the scene we need elevation data for the place we’re interested in. We also need a map of some sort to drape over it to have a better idea of where we are and what we’re looking at. This data at sufficient resolution is hard to come by. We are lucky that as academics we have access to the Ordnance Survey 1:10,000 scale Digital Terrain Model as well as the MasterMap which is pretty high quality and pleasing to look at.

This data comes in a form that is not immediately useful with Povray so the first step is getting the data into the GRASS GIS software which will let us cut it down and transform it. There is also, incedentally, an explanation of how to calculate viewsheds using grass on that page. It is likely to be more accurate but not nearly as pretty.

GRASS knows about Povray’s native format for elevation data, or what it calls height maps. First, though, set up the region so that it matches the elevation data boundaries and resolution by running

g.region rast=profile_dtm

It’s also just fine to set the extent of the region smaller, which will speed up rendering by quite a lot since Povray will have to read the whole thing into memory to do its work, so it helps to cut out any extra unnecessary data at the beginning.

Now, export the data so that Povray can use it:

% r.out.pov map=profile_dtm tga=elevation.tga hftype=1 bias=10

The last two parameters take a little explaining. The hftype will normalise the heights so that by the time they get to Povray they will be in the 0..1 range with the highest point in the data being 1. The bias parameter is because there are some negative elevations in the data, the sea is mostly in the -2m to -4m range for some reason. So that shifts everything up by 10m first.

Making sure to keep the region the same, also export the base map that we’re going to drape over the elevation data. It’s fine to use a PNG file for this:

% r.out.png map=raster_250k output=basemap.png

This is using the 1:250,000 version of MasterMap because anything more granular is terribly cluttered at the effective zoom level that we’re interested in.

Ok, that’s enough. We have our data, now let’s get Povray to put it on the screen. We’re going to try to keep using the Ordnance Survey coordinates in Povray to keep things as simple as possible. Fortunately the OSGB36 datum on which grid squares are based makes for pretty close to square pixels at least within the UK – otherwise we may have had to transform the data a bit more first. So to keep the coordinates straight, let’s get some information about the region from GRASS:

% r.region -g
n=880005
s=764995
w=109995
e=205005
nsres=10
ewres=10
rows=11501
cols=9501
cells=109271001

In our case, we only need the (n,s,e,w) values. So let’s start our Povray script with a little bit of boilerplate and then some constants:

#version 3.7;
#include "colors.inc"

#local n=880005;
#local s=764995;
#local w=109995;
#local e=205005;

#local hbias=10;
#local maxheight=1068.0+hbias;
#local hscale=2000;

The hscale value is arbitrary and serves to bring up the mountains and make the relief more obvious. To find out what the maximum height is, use this command:

% r.info -r map=profile_dtm
min=-5
max=1068.8

(in retrospect, it would have been ok to use a height bias of 5).

Now we need to set up our points. What we’re going to do here is place a radio (light) up on the hill behind Portree pointing at Arnish on Raasay. And we’re going to have the camera near the radio and looking out over the Inner Sound towards Tor Mòr on the Applecross peninsula. To do this takes a small amount of arithmetic to figure out the correct heights:

#local portree=<146740, 842710, (302+hbias)*hscale/maxheight>;
#local arnish=<159801, 847786, (110+hbias)*hscale/maxheight>;
#local tormor=<171124, 842971, (110+hbias)*hscale/maxheight>;

To find out the correct height at a particular point is done as:

% r.profile --q input=profile_dtm profile=146740,842710
0.000000 301.100006

and we just round that to 302m. Then we add the height bias and scale it according to the maximum height and our scaling factor.

Next we place the camera, high up, a little behind our site at Portree. The sky and right parameters are to fix Povray’s idiosyncratic use of a left-handed coordinate system. We like our x axis to point to the right, the y axis to point to the front and the z axis to point upwards.

camera {
    sky <0,0,1>
    right <-1.33,0,0>
    location <135000,840000,7*hscale>
    look_at tormor
}

Then the light sources. First, the ambient light. The placement of this is a bit fiddly to get right. It needs to be somewhere where it adequately lights the scene and doesn’t cause too many extraneous shadows. The second vector is the colour of the light. It is a dim white light. It needs to be dim, and fade with distance so that the lights that correspond to the radio stand out clearly, but it needs to be bright enough also so that the scene isn’t too dark.

light_source {
    <e,s,50*hscale>
    <0.1,0.1,0.1>
    fade_power 1
}

The next light sources correspond to the radio. A red omni-directional light will illuminate everything that can be seen from the site. A green spotlight with a 20 degree wide beam will point towards Arnish on Raasay. Where these two lights overlap the result will be yellow.

light_source { portree <1,0,0> } 
light_source {
    portree <0,1,0> spotlight
    point_at arnish
    tightness 0 radius 10 falloff 10
}

The next section is the main one. It provides the backdrop for the scene and all of our lights. The image_map needs to be rotated to match Povray’s notion of what an image_map should be, and we add a fairly shiny metallic finish so that we can see the colours properly.

height_field {
    tga "elevation.tga"
    smooth 
    texture {
        pigment {
            image_map {
               png "basemap.png"
               interpolate 2
               once
             }
             rotate <90,0,0>
        }
        finish {
            diffuse  1.0
            specular 0.8
            metallic
        }
    }

    translate <-0.5,0,-0.5>
    rotate <90, 0, 180>
    scale <-1,1,1>
    translate <0.5,0.5,0>
    scale <(e-w),(n-s),hscale>
    translate <w,s,0>
}

The last section is to get our elevation map, which starts off as a 1x1x1 cube, oriented properly in our right-handed coordinate system and placed so as to line up with the native map coordinates. First we move it so that the origin is in the middle. Then we rotate it around so that it is properly in the x,y plane. Then we find that it’s inverted, so we reflect it in the x axis. We put it back so that the bottom left-hand corner is at the origin and scale it up to size. Finally we move it to the right place.

If we were to render the image now, there would be an ominous black sky. To make it nicer, let’s make the sky a nice light blue:

sky_sphere {
    pigment {
        gradient y
        color_map {
          [0.2 color <0.5, 0.75, 1>/2 ]
          [1 color <0, 0, 1>/2 ]
        }
        scale 2
        translate -1
    }
}

And we also dim the ambient light a little bit more:

global_settings {
    ambient_light 0.5
}

Finally we can do the rendering. This may take some time. On a fast computer with 8 processors and a lot of memory the process occupies about 1.5Gb of RAM and the whole thing takes about a minute to run. Most of the time, about 40 seconds, is reading in the data files and the rendering itself takes about 20 seconds.

% povray -d +A +W1280 +H1024 +Oportree.png portree.pov

The +A parameter turns on anti-aliasing which makes the result look nice and smooth. It does increase the rendering time significantly, though, by about double. The -d parameter stops Povray from popping up a window to briefly show the result of a rendering step.

The complete Povray script to render this image is available here.

Thanks to Schuyler Erle, Rich Gibson and Jo Walsh for their excellent book Mapping Hacks for a very useful leg up getting this to work.

Happy Hacking!

Allanton Rural Community goes live

by Peter Buneman
13 Feb 2013

Last week, with help from HUBS, a small rural and farming community near Allanton completed their connection to the Internet. The project is interesting becuse it shows how a very small community with limited resources can, with a little technical help, “go it alone” in obtaining fast broadband.

Rerouting Fun and Games

by William Waites
03 Jan 2013

The weather was against us over the holidays. Power outages at Sabhal Mór Ostaig partitioned the network. The wind blew the mast in the Coille Mhialairigh out of alignment with Eigg and the college and disloged a power connector up on the Sgurr. An upstream router in the UHI network failed, and our old core router from the original Tegola network is still missing in action.

To a great extent, we were able to maintain connectivity through this storm, though it required some manual intervention and trickery. This article is kind of a post mortem examination of what went wrong, and why and how it was fixed together with some thoughts on how some of the transitions might be made to happen more smoothly the next time.

The first indication for me that something was wrong came, naturally, from the monitoring system – which is just the Nagios software running on a computer that connects to the network over a VPN. Mailboxes were filled to overflowing with alerts loudly proclaiming that “everything” was “down”. Oh dear. Looking into this it quickly became apparent that the far end of the VPN connection was unreachable – the far end being a Linux PC that is also the original core router for the old experimental network.

Speaking with Peter, who was on site, we found that the VPN gateway was also unreachable from within the network, and that while our new core router at SMO was functioning, connectivity to the outside world had failed a couple of hops beyond it within the UHI network. This is a bad failure mode because the router can’t detect it directly, it just has a default route pointing out through its neighbour, and as its neighbour was up it just kept sending traffic into the void a little way further out. UHI/SMO are not equipped to use a routing protocol like BGP out there at the college that would have detected this condition.

Because our router couldn’t detect the failure, traffic couldn’t automatically fail over to our alternative upstream connection, the bundle of DSL lines that Hebnet has in Mallaig. Ok, simple enough, remove the default route on the core router, and now traffic starts flowing out through Mallaig. But there’s a problem: the DSL lines there are still set up for use on Knoydart, with a policy-based routing arrangement instead of load balancing. This will take a little explaining.

Historically the DSL lines were ordered from BT retail because at the time, it seemed to be the obvious thing to do. BT cannot support any kind of bonding or aggregating or inverse-multiplexing of lines. As clients we can’t really do anything about that directly. To load balance the choice of which line a given packet is going to go down has to be made before it enters the line. BT controls the equipment on that end and they won’t do this. So the ancient workaround was to have the router in Mallaig assign certain client networks to certain lines and to do address translation so BT couldn’t see this happening.

There are several problems with this approach, but the one biting us in this case was configuration skew – this router needs to know in detail which subnets are present in the network and which line they should be assigned to. But everybody started using the fast connection at the college long ago and the network evolved significantly since then and now the configuration on the Mallaig router was hopelessly out of date.

The first thing was to restore some level of connectivity for everybody. This meant ripping out the policy-based routing cruft and making everyone share a single line. There is no way a single line could handle the volume of traffic normally experienced by the network, but degraded service is better than no service. The next step was then to restore some level of aggregation over the multiple lines.

In the meanwhile, in order to relieve congestion on the Hebnet (Knodyart) line that was now being shared over all the Small Isles and up to Knoydart and Loch Hourn, Eigg and Rum manually switched over to their own backup DSL lines.

Well, not quite the next step. There are still some stragglers that continue to use the old experimental network. But the router that connects that network to the new one and to Hebnet had died. Fortunately there are still a couple of places where the old and new Tegola networks touch. These links, at Beinn Sgritheall and Corran are normally just used for out of band access for experiments, but in this case, turning on OSPF to leak routes between them resulted in merging the two autonomous systems.

Back to load-balancing. The first tactic was to try to be clever and do things the Right Way. Andrews & Arnold is a friendly and clueful ISP whose staff is often reachable on holidays and at strange hours of the night, and is a reseller of Openreach DSL. I remembered from some early experiences with Bell Canada resellers that the PPP session is directed to one reseller or another based on the “domain” part of the PPP username, that is the part after the ampersand. Maybe if we could direct the PPP sessions on those DSL lines to A&A they would be able to take care of the far end part of the load-balancing. Unfortunately it turns out that the way the Openreach network is constructed, this will only work for the newer 21CN network and not the first generation 20CN ADSL. And the Mallaig exchange, like most of them outside the big cities in Scotland, has not been upgraded. So much for that idea.

Well, the alternative is tunnels. One tunnel on each line to some place out on the Internet (hopefully nearby) under our control so we can decide how traffic gets distributed. Fortunately we have a FreeBSD router on good bandwidth out on the Internet that could be pressed into service for this. After a little while wrestling with a couple of errors in the Cisco router in Mallaig’s config, all the tunnels were up and all the DSL lines were being used.

But after a little while, some of the users were complaining about intermittent problems, some sites not loading or loading very slowly, without any strong pattern. As well, there were complaints that people couldn’t send email. This last was easy to diagnose. The router is behind a packet filtering router that restricts which hosts can be connected to for sending mail. Usually this makes some sense, the restriction is intended to prevent computers infected by virii from sending spam. But in this case it was preventing legitimate communication – this is often the case with security measures, that when someone tries to do something perfectly reasonable that happens to be at odds with the assumptions of whomever came up with the policy, things break. In any event the fix was to redirect all mail through the university’s mail servers, which is probably also against policy, but at least it works.

Heisenbugs are a little harder to diagnose. We managed, by using the firebug plugin for firefox to find a URL that would consistently hang when a request was made. Fetching it on the command line, and running tcpdump revealed persistent attempts by the web server to send some data followed by responses from the router that said the packets were too big and to send smaller ones or allow them to be fragmented. This is a mechanism called Path MTU Discovery which is supposed to make it possible for any two computers on the Internet to find out the size of the largest packet that can travel between them without being broken into smaller pieces. The “packet too big” messages are ICMP messages, as are the packets used by ping, for example. ICMP messages are often blocked by misguided network administrators that believe there is some security reason to do so. What it really does is prevent computers from finding out that they have tried to send a packet that is too big and that they should send smaller ones. SSL sessions are particularly badly affected by this because they begin with an exchange of large packets for the cryptographic handshake. There’s an article on the Tegola web site that explains PMTUD and MSS Clamping in some detail.

This happens with tunnels because a few bytes (20) need to be used for the encapsulation. The tunnels need a normal outer IP header with addresses visible on the Internet at large, and then an inner IP header containing the addresses used for the tunnel itself. This lowers the Maximum Transmission Unit from what is usually 1500 (standard for ethernet) to 1480. But this wasn’t sufficient. The “packet too big” messages indicated that the tunnel MTU was in fact 1280 even though it had been set to 1480. Well, I still don’t know where those extra 200 bytes disappeared to, but clamping the segment size in the TCP handshake to something that fits into 1280 bytes worked, and as this was a temporary situation in any case, there things stood, finally working properly and consistently.

Now that this is in place, failing over in the future ought to be much smoother. There are some improvements to be made. When the contracts run out on those BT lines they should be moved to a provider that can support load-balancing so we don’t have to do sub-optimal things with tunnels to make it work. Where we do use tunnels, they ought to land on a router outwith departmental firewalls and restrictions intended for desktop computers. Wherever possible, upstreams should use a routing protocol to indicate the presence or absence of connectivity to us so that failures can be detected.

Today the problem with the UHI upstream router was fixed, and it was a simple matter to replace the default route and have everybody back on good bandwidth again, everything back to normal.

Redundant Power in Knoydart

by Davie Newton
13 Nov 2012

Attached a couple of photos of the latest set-up.

The UPS set up is a Studer XPC 1400-12 inverter/charger with two Yuasa ENL100-12FT batteries attached in series. There is a changeover switch and generator input(no generator although we could find one). There is an easy option(blue plugs) of bypassing the inverter/charger if there happens to be a problem. This whole thing has taken a lot of time and some confusion but think the end result is good(if anything a bit over the top) and should easy last a whole night given the demand at the top of the hill. There was a power down for maintenance on Sat and there was no down time at the top of the hill.

The dish with the radome is the original that points to Mallaig. The green box came from the original Bridge broadband set-up. Bit of work to do on the platform although the steel at the back takes most of the weight. The ground is a pig here so will have to build up in solid.

Final picture shows green box at transformer site before getting installed. Transformer is at the end of the power line hence the need to put in pretty hefty protection. Transformer is about 250m from the road at glaschoille. The dish site is 600m further up the hill. Mobile reception at top mast – but not 20m from it. A help might be a HomePlug connector between the bottom and top of the hill. With a cable into the switch at the top this would allow alternative access without going up the hill. On sunday – during some work at the bottom – the top all seemed to disassociate. Sector wouldn’t show up on a site survey so I had to make the assumption that it had powered off. It hadn’t - was just playing a wee waiting game but still. A phone call from the top to Simon and a wee fiddle about helped to solve!

UPS(APC 700va ES-8) have also been installed at three relay points so the system should not need the round of re-setting we currently have to do on a power maintenance day.

A Caution on Satellite Broadband for Remote Communities

29 Oct 2012

Avonline and Tooway are offering satellite broadband at prices that are reasonably close to those available in cities for a superficially similar service. They are extremely attractive for rural areas where there appears to be no real alternative. The purpose of this note is to set out some of the problems with satellite offerings.

Cost First, these packages come, for good reason, with a monthly limit of the amount of data you can download. A basic package provides 5GB (gigabytes), which is enough to stream 2-3 hours of high-definition TV. Our experience is that households in rural areas – especially households with children – can well exceed that limit. Cisco predicts that the average household will generate over 128GB per month by 2016. In fact, this is a world-wide prediction. If Scotland wants to be ahead of the game, the demand there should be greater.
Latency It takes at least 1/4 second for a packet to be transmitted from earth and back via geostationary satellite. If two people are using satellite connections the delay is at least 1/2 second and usually more. Using VOIP (Skype, Internet telephones etc) with this kind of delay is difficult. Moreover applications, such as secure banking, require extensive “handshaking”, and therefore multiple round-trips. In any demonstration, be sure that you see an example of a VOIP connection with two round-trips.
Long-term prospects The Hylas-1, Hylas-2 and Eutelsat KA-SAT have recently been launched, so there is substantial additional capacity. The number of consumers that these will serve is estimated at 2.3 million (presumably at the lowest quality of service). However there are are also estimated to be some 30 million premises in Europe alone with broadband speeds that make the current offering look attractive. Unless more satellites are launched the demand, and hence the price, are certain to go up. Be sure to check on the maximum length of contract.
Infrastructure Remote communities need improved infrastructure (fibre and high-speed terrestrial wireless). Using satellite will reduce the pressure on government and telecommunications companies to improve this infrastructure.
Experience Communities on Knoydart and Eigg have been “burned” by satellite companies failing, and even the government-subsidised Avanti offering has performed below expectations (Avanti is a partner of Avonline). This has led Knoydart and Eigg, for example, to build their own wireless networks that have greatly outperformed, and proved cheaper than, current satellite offerings.

Further technical documentation shows that Scotland is relatively poorly served by existing satellites. It is on the fringe of coverage and has rather low aggregated available bandwidth.

To summarise: Satellite may be useful as a stop-gap solution, and will be useful for short-term ad hoc connections; but unless terrestrial infrastructure is improved, be prepared to be relatively worse off in five years time than you are now. Also, look carefully at the price you will end up paying and the performance you will be getting. The alternative of community broadband projects, although they require an investment of effort, deliver better performance at lower costs with lasting improvements to the infrastructure.

Peter Buneman
University of Edinburgh & Tegola

Michael Fourman
University of Edinburgh & RSE Digital Scotland report

Simon Helliwell
Eigg & HebNet

David Newton
Knoydart & Knoydart Foundation

William Waites
University of Stirling & Network Engineer, HUBS

Community Broadband Meeting -- Preliminary Report

by Peter Buneman
14 Oct 2012

The organisers of the meeting felt that the event was very successful, and hope it will continue to contribute to the Community Broadband Initiative. We were impressed by the breadth, energy, and diversity of community engagement. It illustrated the dynamism of community projects; and we must not lose this momentum. Over 100 people attended, representing more than 20 communities who have built, are building or are considering building their own network infrastructure.

One participant, @cyberdoyle tweeted, “it was scottish government and communities getting together to share knowledge, celebrate and it was mindblowingly awesome, and buzzing with great ideas.”

A big thank you to Sabhal Mòr Ostaig for organising the meeting and to Sabhal Mòr Ostaig, the University of Edinburgh and Scottish Government for funding it.

This site www.tegola.org.uk is intended to help these groups start to work together and help each other and others. We have now put up a Wiki so that communities can keep in touch we hope it will be a place to document and share experiences and know-how with each other. Please help to fill it out!

In the meantime please post impressions, suggestions and contributions in the comments below.

Over the Sea From Skye

by Peter Buneman
05 Oct 2012

Next week, on Fiday 12 October there will be a community broadband event on Skye at which the Scottish Government will describe its plans to support community broadband. A large number of representatives from communities around Scotland will be present.

More details are to be found on the event page.

Places are still available and some travel funding is available for community representatives. Contact Christine Mackenzie 01471 888 200.

Every Network is Different

by William Waites
24 Sep 2012

Yesterday evening there was an outage on the Tegola / Knoydart / Small Isles network. It was entirely preventable and due to human error. What follows is an explanation of what happened and what was done to prevent it happening again.

We have long been in the habit of assigning loopback addresses to the routers in the network. These addresses are used as router identifiers and provide a way to refer to and reach a router that is independent of any of its physical interfaces.

A few weeks back we sent a soekris router to Eigg for installation on the Sgurr. It had been assigned the loopback address of 10.127.255.16. It also had essentially been dormant for the past while since the link to the Coille Mhialairigh mast had been disabled due to power problems.

In the meanwhile, much work had been done on the Loch Hourn network, constructing an entirely new production network in parallel with the old experimental one. A clerical error meant that the same address, 10.127.255.16, had been assigned to the edge router for Arnisdale.

Everything was fine until Sunday afternoon, when Davie from the Knoydart Foundation went up the hill to Creagan Dearga and connected the radio facing Eigg – Eigg is meant to have two links, one to Knoydart and one to Loch Hourn. What happens when you have two routers in an OSPF network with the same router id is not pretty. Everything went haywire in a semi-localised way, where Arnisdale was intermittently reachable, flapping up and down.

It was some trouble to track down what the problem was because we didn’t know that the new link had been turned on just then (though it had been planned for some time). It was also hampered by the failure mode. The quagga routing daemon on the Arnisdale router simply started crashing without doing anything helpful like putting diagnostic information in the logs. This was not exceptionally surprising given that we have experienced stability problems with quagga, particularly the OSPF daemon, before. And obviously because the problem appeared to be happening in Arnisdale, we spent quite a lot of time looking for the problem there – not realising that the source of the problem was actually 40km and four hops away on the backbone.

OSPF belongs to a family of routing protocols called Interior Gateway Protocols (IGPs). They are intended for use within a single administrative domain – where all devices are managed by a single organisation or team of people. The facilities for filtering or applying policy are minimal, the basic assumption being that things will be done correctly and consistently throughout. Errors are freely propagated through the network with sometimes mysterious consequences as happened here.

There is another family of routing protocols called Exterior Gateway Protocols (EGPs), the only non-obsolete member of which is the Border Gateway Protocol (BGP). The posture of BGP is much more defensive. It assumes that anything external to an Autonomous System is run by somebody else, and any mistakes that they may make should not necessarily affect the internal functioning of your network.

In this case, there is close cooperation amongst the networks and they could be considered to be part of the same administrative domain. Still, it isn’t a bad idea to partition them and use BGP peering sessions at the borders to exchange routing information. There are important differences because of which this makes sense:

Tegola (old): OSPF
Tegola (new): OSPF
Knoydart: Static routes
Eigg: Flat network (bridged)

Particularly the old experimental Tegola network ought to be strongly separated from the rest which carries production traffic. The Knoydart and Eigg networks are different enough in that they don’t really use an IGP, except inasmuch as static routes or a single flat network can be considered so. And really it is up to them how they build and run their networks and because the Tegola networks are running OSPF doesn’t mean that they should.

So the network is now broken into four pieces, using autonomous systems drawn from the space of private AS numbers (64512 through 65534) and each peers with at least one adjacent network. It is now coherent to draw diagrams using fluffy clouds – since what matters, and what can be managed, are the relationships between the networks, without regard to their internal workings.

This is all done with the BIRD routing daemon – though could as easily be done with Quagga. The relevant fragment of configuration file from a new tegola router, is reproduced below, complete with comments.

/**** Section: Utility functions ****/
/*
 * Identify if the given network should never ever
 * be seen in the routing protocols...
 */
function is_bogon(prefix network)
{
        if (network ~ [127.0.0.0/8, 192.0.2.0/24])
                then return true;
        return false;
}

/*
 * Local networks originated by this ASN
 */
function is_local(prefix network)
{
        /* the main netblock (supernet) for this network */
        if (network = 10.11.0.0/16)
                then return true;
        /* a kludge because UHI also uses this range for
           their servers and some people need to see their
           web servers for distance learning */
        if (network = 10.130.1.0/24)
                then return true;
        return false;
}

/**** Section: Route filters ****/
/*
 * To BGP Peers
 */
filter export_BGP {
        /* allow the announcement of a default route */
        if (net = 0.0.0.0/0) then accept;
        /* do not under any circumstances announce things
           that are obviously bogus */
        if is_bogon(net) then reject;
        /* announce local networks */
        if is_local(net) then accept;
        /* announce networks that we have learned from other
           BGP peers */
        if source = RTS_BGP then accept;
        /* do not announce anything else */
        reject;
}

/*
 * From BGP peers
 */
filter import_BGP {
        /* Do not accept obviously bad networks from peers */
        if is_bogon(net) then reject;
        /* Do not allow peers to announce our networks at us */
        if is_local(net) then reject;
        /* Otherwise allow */
        accept;
}

protocol static STATIC {
        /* default route */
        route 0.0.0.0/0 via 194.35.194.1;
        /* kludge for UHI web servers */
        route 10.130.1.0/24 via 194.35.194.1;
        /* our own supernet (more specific routes learned by
           OSPF will overried this */
        route 10.11.0.0/16 reject;
}

/* peering session with the old tegola router */
protocol bgp T1 {
        local as 65533;
        neighbor 194.35.194.250 as 65534;
        export filter export_BGP;
        import filter import_BGP;
        next hop self;
}

/* peering session with knoydart */
protocol bgp KNOYDART {
        local as 65533;
        neighbor 10.11.0.2 as 65532;
        export filter export_BGP;
        import filter import_BGP;
        next hop self;
}

/* peering session (internal) with coille mhialairigh */
protocol bgp MHIALAIRIGH {
        local as 65533;
        neighbor 10.11.0.50 as 65533;
        export filter export_BGP;
        import filter import_BGP;
}

The Arduino Voltmeter

by William Waites
07 Sep 2012

When testing the Long Link to Eigg, we ran into a problem with the solar powered mast at Coille Mhialairigh. Whenever we put any particular load on the network there, the switch at the mast would reboot – even if the traffic itself didn’t touch the switch. The suspicion is that the extra load on the power supply causes the voltage to drop outwith the tolerance of the little netgear switch.

Testing this hypothesis is not easy, however. It might involve a lot of sitting in the cold on top of a hill with a voltmeter while the problem is reproduced. Much better to try to arrange to be able to do it from the comfort of the house, by the warm fire.

Recently at the EMF Camp we were given badges that contained little boards with Arduino micro-controllers on-board. These are fantastically flexible tiny computers, intended primarily for artists making electronic scuptures, and for use in the classroom to help teach basic electronics, and of course for hobbyists who put them to all sorts of uses.

In this case the task is pretty simple – measure the power supply voltage which, in practice will be the same supply that powers the Arduino board itself. This is done by using an analogue input pin on the board. It has six, but we only need one. These pins allow to read a voltage in the 0-5V range, which is turned into a number between 0 and 1023.

The power supply cannot be directly plugged into the analogue input because it runs at 12V and would burn out the pin. So a simple voltage divider circuit is used:

The resistors, R1 and R2 are chosen so that they are in a ratio of more or less 3:1. This means that, roughly the input can range from 0-20V and by tapping in between the resistors it will have been reduced by ¾ and what is measured on the A0 pin will be in the correct range.

The actual voltage is reconstructed like so:

[ V^{+} = A_0 \frac{5}{1024} \frac{R_1 + R_2}{R_2} ]

The chosen values are, \(R_1 = 680 k\Omega\), \(R_2 = 220 k\Omega\). This corresponds at 12V to a current of \(13 \mu A\) (cf. Ohm’s Law) – meaning measuring the voltage won’t appreciably drain the batteries. Of course this needs to be added to the couple of hundred milliamperes that the board itself will draw.

To make it convenient to retrieve the readings, the Arduino Ethernet Shield is used, together with the Webduino library. The entire program is a tiny 162 lines of code, including comments and blank lines.

It does two things:

Interrupt every second and take a new voltage reading, add the reading to a circular buffer.
Upon receiving an HTTP GET request, serve up the contents of the buffer as a JSON document.

The reason for the circular buffer is so that the readings can be stored while the faulty switch is rebooting and read out afterwards without losing any data.

Now to head back up North and install it and see what we find!

Allanton Site Survey

by William Waites
24 Aug 2012

Marwan and I went out to the University of West Scotland in Hamilton to see about putting a radio up on one of their buildings to connect a small and relatively sparse community near Allanton.

The proposed setup in this case is somewhat simpler than the networks out on the West coast, being simply a Nanostation M5 at the university, which has a nice wide radiation pattern and then several rather more directional Nanobridge M5 out in the field. This was just an exploratory trip, so everything was done in a very temporary manner, the Nanostation strapped to a 2m length of 40mm plumbing and wedged behind a filing cabinet, and the Nanobridges mounted atop a similar length tube.

After spending the morning with Brian Mullins, head of ICT at UWS explaining what we proposed to do and putting up the temporary radio next door to the Campus Director’s office, we headed out to Allanton with Anne Graham and Hew Colquhoun to see if we could see the signal after a nice lunch.

The initial results were somewhat disappointing, however. Only from outside of Hew’s house were we able to get a link, and even then the signal was very weak, too close to the noise floor to really be useable. On the bright side, there were very little other signals in the 5GHz band that we could see, and none on the particular channel that we had chosen randomly in advance.

There are several things to try next. Obviously the arrangement at the University is very temporary and far from ideal. The best thing would be to mount it on the roof. It might be that the glass in the windows is interfering with the signals as some kinds of glass treatments are known to do this. It is also possible, even likely, that we misjudged the vertical angle of the antenna there and it actually is aimed too far downwards. This last is the easiest thing to change, so obviously the first thing to try – simply adjust the ad-hoc mounting.

If that fails, and this may be a good idea in any event, use a radio with a better antenna, such as the Rocket M5 with a [90̣° sector] – these are the ones we use up North.

The final possibility is, there is a church about mid-way between the school and Allanton, and it is visible from pretty much the whole area. It may well be possible to use it as a relay site.

We’ll have another go next week…

The Long Link to Eigg

by William Waites
12 Aug 2012

Spent the weekend on the Isle of Eigg with Simon Helliwell taking in the Small Isles games on Saturday and then heading most of the way up to the Sgurr of Eigg to a spot with a mast that has a panoramic view over the Sound of Sleat to the Northeast, Knoydart, Mallaig and Arisaig to the East and Glenuig, Ardnamurchan and Coll to the Southeast and South.

We aimed, to make a 44km link to the Coille Mhialairigh to the West of Beinn Sgritheall in order to connect Hebnet, the network on the Small Isles to the Loch Hourn network. This was to be our longest link, with most of the others being in the 10-20km range. On either either end we used being the 5 GHz Ubiquiti Rocket M with parabolic antennæ and the Loch Hourn end was set up on the previous trip to Arnisdale.

The weather was beautiful and we were fortunate that, as with the previous link from the Creagan Dearga in Southwest Knoydart to the Sabhal Mòr Ostaig college, the link came up immediately, at more or less full signal strength – in the neighbourhood of -65dBm, corresponding to a Signal to Noise Ratio of about 25:1. So far so good. After some tidying, we headed back down the hill to take a closer look at what was needed for Eigg to make use of this new link.

The network on Eigg is flat. That is, it doesn’t employ routing, all of the radios are set up as one big bridge. Though there are disadvantages to such a setup, it does have a couple of advantages. In this case it meant that we could use the new link at Simon’s house without changing any of the other end-sites simply by changing the default gateway on his local router. When we did so, we immediately found that something was wrong.

The physical layer data rate at which the link synchronised appeared, according to the diagnostics on the radio on the Sgurr, to be some 180Mbps in the Eigg – Loch Hourn direction and 6Mbps in the reverse. Even at that, dropped packets made the link almost unuseable. Checking, we found that the distance setting on the Sgurr radio wouldn’t go beyond about 25km. This setting really governs the 802.11 acknowledgement frame timeout – there’s a paper here that discusses this in some depth, but the brief version is that the farther apart the radios are, the longer they must wait for acknowledgement of their transmissions, due simply to the fact that electromagnetic radion travels at a certain speed (the speed of light, actually). The Loch Hourn end, running OpenWRT, happily accepted a distance setting of 50km (the usual practice is to overestimate the distance by about 10% for best results).

Some pulling out of hair, and a lot of trial and error later, we found that this was related to the channel bandwidth. With 802.11n there is a choice of channel bandwidth. Channels can be either 5, 10, 20, 30 or 40MHz wide. In general, and assuming good conditions, the wider the channel, the faster the link since more information can be sent per unit time. The usual tradeoff is that in areas where the radio spectrum is congested, it might not be possible to find a contiguous block of 40MHz that is free from interference. This is often the case in cities, but the West Highlands and Islands is a pretty RF-quiet area.

However, there is another circumstance in which it is not possible to use wide channels. Because 802.11 works by transmitting a frame and then waiting for an acknowledgement before transmitting the next one (and possibly re-transmitting if the acknowledgement is not received in time), if you increase the speed at which frames are sent, you also increase the speed at which acknowledgements are sent. But this speed is also limited by the physics. To see this is simple, assuming no time spent processing between packets, a packet may be sent at most once every t seconds:

[ t = \frac{2d}{c} ]

where \(d\) is the distance between the radios, and \(c\) is the speed of light. The maximum rate, in packets per second, is then the reciprocal,

[ r = \frac{1}{t} = \frac{c}{2d} ]

In our case, setting \(d\) at 44km, \(r\) works out to about 3400pps. If we take the average packet to be the best case, 1500 bytes or 12000 bits, and call this \(s\), we have,

[ b = rs = 3400 \times 12000 = 40.8 Mbps ]

which, ignoring other overhead like forward error correction and such, is the maximum possible transmission speed. But these speeds are possible with a 20MHz channel. Using a 40MHz channel is simply a waste of RF spectrum. This is the reason for the seemingly arbitrary limit in the Ubiquiti firmware.

So, we set the channel bandwidth to 20MHz at both ends and, presto, the link came up and began to function perfectly.

By way of testing, we watched the webcam at the Coille Mhialairigh mast that Peter Buneman had pointed at Eigg. After a short time, we lost connectivity. The link remained up, but we couldn’t see beyond the radio on the other side. After a minute or so, it came back. We repeated this a couple of times, and the same thing happened. Very strange.

Back in Edinburgh, we conducted some more specific tests. It proved to be possible to crash the switch by simply running the iperf tool between the radios. This test didn’t involve the switch directly at all. It turns out that the answer has to do with the Coille Mhialairigh mast being self powered, or rather powered by solar panels and batteries about 50m down the hill. The Ubiquiti radios are designed to be fairly tolerant of different input voltages from their power supplies. They ship with a 24V Power over Ethernet but will happily accept anything from about 10V. The little Netgear GS108T-200 on the other hand wants something pretty close to 12V. The extra draw caused by continuous transmission must be enough to cause the supply to the switch to fall outwith its tolerances, causing it to reboot.

This is to be fixed by, in the immediate term, removing any extra cabling to improve the efficiency of power delivery to the mast at Coille Mhialarigh, and in the medium term arranging mains power for the mast. This shouldn’t be too hard as there are power lines only a couple of hundred meters away.

New Sabhal Mòr Ostaig Point of Presence

01 Aug 2012

Our new setup at Sabhal Mòr Ostaig, Ubiquiti radios on the wall (Peter Buneman hiding behind one of them), powered by a Netgear GS110TP switch that gives us PoE and vlans, and the Soekris router humming happily along with FreeBSD. Today we replaced the other Soekris router at Creagan Dearga, and so far all is well.

Soekris Crash Duplicated in the Lab

18 Jul 2012

Continuing the saga of the crashing Soekris Net5501 running OpenWRT… In the lab I now have a setup like the picture below. My laptop is connected to the OpenWRT router taken from the college on Skye, and on the far side of it is identical Soekris hardware running FreeBSD.

The two soekris routers are connected with three of their ports in parallel, i.e. eth0 connected to vr0, eth1 connected to vr1, eth2 connected to vr2 (FreeBSD names its interfaces according to the kind of driver/chip they use).

The laptop generates traffic travelling via the OpenWRT router to the FreeBSD router. It does this using the iperf tool run like so:

iperf -c 10.0.0.2 -u -b 20000000 -t 3600
iperf -c 10.0.1.2 -u -b 20000000 -t 3600
iperf -c 10.0.2.2 -u -b 20000000 -t 3600

this will fill up each of the three links between the routers with approximately 20Mbps worth of UDP traffic each (for an hour).

In fact with only two of these running the OpenWRT host will reliably crash in a matter of minutes. Nothing whatsoever on the console, no stack traces or memory dumps, nothing. With only one running the OpenWRT router seems stable enough, it didn’t crash after a day or so with far more traffic being pumped through it.

The same setup with both routers running FreeBSD seems to be solid, so far so good.

The Good, The Bad and the Ugly

by William Waites
17 Jul 2012

Davie Newton of the Knoydart Foundation at the Creagan Dearga Mast

I’ve just returned from a week on the West coast of Scotland working at joining up two community wireless networks, Tegola and the one built by the Knoydart Foundation. The plan was to put in fast links from both to the Sabhal Mòr Ostaig college on Skye where we have access to a decent (DS3, I believe) Internet connection. Tegola is already connected to the college via a slower link whilst Knoydart uses a bunch of DSL lines at Mallaig.

The Good

The fast links, 5GHz wireless ethernet using Rocket M radios from Ubiquiti went in flawlessly. In Knoydart, Davie Newton and I climbed up to a knoll on top of the Creagan Dearga (little red crags), put the dish up and aligned it by eye, plugged it in and immediately saw a signal strength of about -60dBm against a noise floor of -91dBm. The wireless card reported a bit rate consistently between 270 and 300Mbps, pretty much the maximum you can get with 802.11n.

Similarly, the link from about a third of the way up Beinn Sgritheall to the college went in flawlessly. This time, instead of running the radio as a bridge with the stock Ubiquiti firmware and plugging it into a router, we ran OpenWRT on it so that it could be a proper router in its own right. This confirmed at the very least that the drivers for the radio chip on them work sufficiently well with a free software operating system (though I believe them to still contain some binary blobs). We also put up a webcam overlooking the village of Arnisdale and Loch Hourn, visible within the network and possibly to be made public at some point in the future.

The Bad

At the college, we run a router built on a Soekris Net5501. This is basically a PC in a small case with four built-in ethernet interfaces and one PCI slot for expansion. It also runs OpenWRT and had been running stably for some time. I brought an identical one with me to install on the hilltop at the Creagan Dearga in Knoydart primarily to provide some fail-over capability – use the fast connection at the college, but in case of a problem there, fall back to the DSL lines in Mallaig.

Anyhow, we came down the hill, went to the pub to admire our handiwork over a pint and try out what should be better Internet access than I get at my house in central Edinburgh. So far so good. Anectodal reports came in of residents seeing 30Mbps in both directions and we kept an eye on the throughput graphs to confirm that this was so. The next morning I got on the boat to head to Arnisdale to put up the Tegola link.

That afternoon, mysteriously, the router at the college crashed. Hard. Completely locked up. Unreachable over the network, nothing on the serial ports, no error indicator lights blinking, nothing. I checked to see if Knoydart had failed over to the DSL lines and sure enough they had. At least that bit was working. The router at the college was power-cycled by the college’s friendly and helpful network admin, Martainn Domhnallach and everything was back to normal. This was Friday afternoon.

Sunday morning it happened again. Then, a couple of hours later the router in Knoydart also seized. Davie ran up the hill to power-cycle it (thankfully he didn’t have to go all of the way to the top as power can be cut nearish the bottom) but by the time he got back to Inverie it had wedged again.

This was unexpected. I have used Soekris hardware before with good results. Particularly in New Orleans with some kit from the Champagne-Urbana community wireless network and in Toronto I had some involvement with a metro-area ISP built mostly with them. But. The CU-Wireless kit was running NetBSD. And the stuff in Toronto was running FreeBSD. The combination of OpenWRT which is a kind of Linux and the Soekris boards is new to me.

The Ugly

So now there were two things to do. Get the network back up and running for the people on Knoydart, and figure out what is wrong with these Soekris routers. In that order.

The first was relatively easily accomplished by unplugging everything from the router in Knoydart and plugging it all into a hub, and putting what should have been the router’s local IP address at the far end of one of the backbone bridges. And patching it all together with a mishmash of static routes. Similarly at the college but because that is an upstream Internet gateway, using a very brittle configuration with an underpowered Linksys router. Not pretty, and full of little hacks and kludges that I can’t wait to see the back of. But it works as a stopgap. Chewing gum and twine.

Searching around, I came across this thread on the soekris-tech mailing list. The problem sounds identical. In it, the chief engineer of Soekris, Soren Kristensen, and one Attila argue about whether it is a hardware or a software problem. Attila argues a design fault involving power distribution around the board, and Soren argues a race condition in the Linux driver for the Via Rhine III ethernet chip that is usually masked by fast processors but shows up with the Soekris under load because the processor is relatively slow.

Soren’s argument rings true partly because I and people who I know have had good results using the Soekris hardware with BSD variants which, obviously, do not use Linux drivers (please let us not get into public discussions about the relative code quality of BSD UNIX versus Linux, there are enough of those, I’m happy to elaborate privately) and partly because these things are not new. If the boards were indeed faulty I’m sure this would be uncovered in the first few pages of Google results for Soekris.

And there is a smoking gun that some fixes have been put into very recent Linux kernels but apparently this problem is still there.

So the plan now is to,

Duplicate the problem in the lab rather than on the hilltop which, for all the stunningly beautiful views can get quite windy and cold.
Try a bigger power supply on the off chance that this improves things.
Try a recent development snapshot of OpenWRT to see if newer versions of the driver have been fixed.
Try running FreeBSD on them and benefit from a codebase that doesn’t have this bug.
When something seems to work, stress test it excessively in the lab before bringing it up North…

To follow with a report on these findings in the next few days…