As soon as the Paco release was out last christmas, we’ve started working on the next firmware release. The focus of this release will be on improving the 3G experience, which is the topic of this post. I’ll detail a few of the things we’ve be working on, to give you an idea what’s coming up and to get an idea of all the different components that are involved in getting a 3G connection up and running.
Changing the connection manager daemon
The current firmware uses “umtsd.lua”, which is a simple connection manager written in lua. It was a first attempt at something back when the 2.0g and 2.0n were being released, and that shows. Its internals are a bit messy and it is hard to debug any 3G problems due to limited feedback.
To solve this, a new daemon was written, named “udiald” (originally it was called “umtsd2″). This daemon has been released by Fon under an open source license and its sources can be found on github. This new daemon is written in C, has a cleaner structure and makes it a lot easier to change device-specific configuration and get debug output. We expect we can support more 3G devices in the future because of udiald, though in the short run some devices might stop working when switching to udiald (because they just worked “by accident” in umtsd.lua and now need some specific tweaking applied, for example).
At some point, this udiald (or probably something based off it) might also be integrated into the main OpenWRT repository, to improve 3G support in general OpenWRT as well.
Rewriting the 3G webinterface
Managing the 3G connection is a bit of a challenge in the current firmware. When you plug in a device, it automatically connects and having two 3G dongles connected at the same time is asking for trouble. If you’re running into problems, most of the time you’ll only see that the Fonera is stuck “Dialing…”, but there is no indication of any error message.
We’ve revamped this webinterface now, making it easier to see what 3G dongle is connected and what its status is. We’re making automatically connecting on plugin optional (and disabling it by default), so you have more control over how your Fonera connects to the internet.
2.0n USB driver improvements
One of the long-standing issues with 3G devices on the Fonera 2.0n is that most of them would not properly operate when connected through an USB hub. Devices that work correctly when directly connected to the Fonera fail when connected through an USB hub, even when no other devices are connected.
We’ve been diving into this issue for the last month or so and slowly started to unravel the causes for this issue. When communicating with the Fonera, the 3G devices offer virtual “serial ports”, which can send and receive data. When the Fonera is receiving data from the 3G device, it often occurs that the device has no data to send. In this case, the 3G device will send a USB “NAK” message back to the Fonera to let it know there is no data. The (USB controller in the) Fonera can then knows there is no data yet and it should just keep trying until there is data. These NAK messags were the root cause of the problem, but the actual problem differed depending on the USB version of the 3G device:
- For USB 1.1 3G devices, which use so-called “split transactions” to translate from USB 2.0 to 1.1, every NAK message received caused the Fonera to receive an interrupt, process the NAK message and resubmit the original message for a retry. Apparently, these NAK messages would occur so often that the Fonera’s CPU would be busy only with processing these interrupts, leaving no time for any other useful work and eventually causing a reboot.
We managed to fix this issue by only retrying these NAK’d messages once per USB frame (millisecond), a fix that was inspired by a similar fix found in the Raspberry Pi Linux version (which uses a very similar USB controller). This makes the USB 1.1 3G devices we tested work properly through an USB hub.
For USB 2.0 3G devices, the USB hardware takes care of all these retries, so the interrupt overload issue does not occur. However, here we were running into a hardware limitation: The USB hardware uses “Host Channels” (HC) to communicate with USB devices. The Fonera 2.0n has four of these HCs available, and a host channel can be used to send a message to one specific USB “endpoint”, meaning we can send message to four different endpoints at the same time (but once a message is sent, a HC can be reconfigured to send a message to another endpoint if needed).
Now, for each “periodic endpoint” (isochronous and interrupt), a dedicated HC is needed. These periodic endpoints need to have messages sent at regular intervals, and once the time for such a periodic message comes, it must be certain that a HC is availalbe, so the driver reserves one HC for each of these periodic endpoints. This allows up to three HCs for periodic transfers, leaving at least one HC available for non-periodic transfers.
This is what happens when you connect a 3G device through an USB hub. The USB hub and the 3G device both have an interrupt endpoint, which takes up two of the four HCs. Now udiald connects to two of these “virtual serial ports”, one to transfer control messages and one to transfer data. Receiving data from these virtual serial ports takes up two more HCs. These latter two are expected to be used only shortly, but because the 3G device often does not have data to send for some time (a few to a few dozen seconds, typically), all of the host channels are taken up. Now, when udiald wants to send something to the device (either some network data or a control command), it needs one more HC (since input and output are different endpoints in the USB protocol). The outgoing data is queued, waiting for a host channel to become available, which doesn’t happen, so the connection times out before it even started.
To fix this, we’re trying a workaround where these “blocking” HCs (which keep getting NAK replies from the device and keep getting retried by the hardware) are interrupted when the driver runs out of available host channels, so these host channels can be used for other transfers for a while and then resume retrying the blocking transfers. This is now working fairly well in our development environment, though there might be one or two corner cases left to tackle.
In summary: We’ve managed to improve this third-party USB driver in order to better support blocking transfers typically used by 3G devices. This helps to get the 3G devices we have here working, but of course it is no guarantee that all devices will start working with USB hubs: there might be other problems we simply did not observe yet. However, the problems we solved are fairly fundamental, so they will have affected most if not all 3G devices out there.
If you’re interested in the details about this USB driver problem, I can recommend the USB made simple and USB in a nutshell article series, which properly explain the terms and logic of the USB protocol.
Remaining USB limitations
Even though the above should make 3G devices work with USB hubs, it is expected that the performance is somewhat reduced by this approach, because of the way a channel must now be time-shared between multiple transfers.
Furthermore, this approach does not mean there is no limited number of host channels all of the sudden. Now, multiple (blocking) non-periodic transfers can succesfully share a single host channel, but there is still only room for three different periodic endpoints. Fortunately, USB storage devices typically do not use a periodic endpoint, so this limit should be sufficient in most cases (e.g., you can add one hub, a 3G stick and an USB sound card, each of which typically has one periodic endpoint and a number of USB disks or flash drives without problems).
Backporting parts of the serial drivers
When a 3G device is plugged in, its virtual serial ports are handled by an usb serial driver in the Linux kernel. This driver needs to know which devices it should handle (usually based on the usb “vidpid”, vendor id and product id, as advertised by the device) and for some devices, which workarounds or special quirks to apply. The driver most commonly used for 3G devices is called the “option” driver, which we’ve improved by taking the list of USB devices and some of the workarounds from a newer kernel version and backporting them to the Fonera kernel versions. This should allow more 3G devices to be supported by both Foneras, since now udiald can actually talk to these devices.
This change should remove the need to mess with the new_id files in /sys, which was sometimes needed to get a 3G device recognized by its driver.
usb-modeswitch is the software responsible for getting 3G devices into “modem mode”. A lot of 3G devices start out in “storage mode”, where they pretend to be a read-only CD drive or card reader, offering the Windows drivers for the device on a virtual disk. To get these devices to offer their virtual serial ports used for the 3G connection, they need to be sent a special message (which is normally done by Windows driver installed from the virtual disk) to switch them to modem mode.
By updating usb-modeswitch and the associated database of 3G devices and the messages they need to be sent to make them switch, more 3G devices can be supported that were previously not accessible to udiald.
pppd is the daemon that is responsible for setting up the actual IP connection once all the dialing and configuration is complete. It handles negotiations about IP addresses and DNS servers, using standardized PPP (Point-to-Point Protocol). Even though PPP is an old and standardized protocol, we were still running into some problems with 3G devices that would confuse pppd with particular (IIRC not entirely standards-compliant) negotiation sequences, causing the connection to fail. Newer pppd version turn out to handle these devices more gracefully, making this particular problem (and possibly others) go away.
All of the work above is ongoing in our development trees. We’ll start pushing out these changes into the fon-ng Subversion repository over the coming weeks and once we’re satisfied that this produces a firmware image suitable for testing, we’ll be releasing the first 220.127.116.11 beta release. The goal of this release will be mostly to gather information about the support for different 3G devices: which devices work right away, which need some configuration tweaks, are there still more fundamental problems with devices?