WiFi Routers - Some thoughts: 2013

Saturday, 12 October 2013

80211: Frame encapsulation

In this post I will put few points over frame encapsulation. We will see few points that we need to understand in frame encapsulation. We will also see how mac80211(which sits on top of ath9k/ath5k etc) encapsulates it. Top layers send the frame in 802.3 format. It is the job of the driver to strip the 802.3 header and insert the 802.11 header. This process of stripping the 802.3 header and inserting the 802.11 header is called encapsulation.

In this particular post I will cover ToDS, FromDS and address fields of the 802.11 header. The actual values that goes to in each field varies with mode (like station, AP, WDS etc). Following table enumerates each mode and possible values for each field. In mac80211, please refer the function, ieee80211_subif_start_xmit.Please note that this path is only 802.11 data frames.

Mode	Address 1	Address2	Address3	Address4	ToDS	FromDS
AP	Destination Address. Generally one of the associated stations.	BSSID. In general this is the MAC address of the VAP.	Source Address. In general one of the local MAC addresses of the AP or MAC address of machine sitting in the backend network of the AP.	N/A	0	1
Station	BSSID. In general this is the MAC address of the AP.	Source Address. (Station's MACaddress).	Destination Address. In general one of the local addresses of the AP or MAC address of a machine sitting in the backend network of the AP.	N/A	1	0
WDS	Receiver address	Transmitter address	Destination Address. In general one of the local addresses of the Receiver or MAC address of a machine sitting in the backend network of the Receiver.	Source Address. In general one of the local addresses of the Source or MAC address of a machine sitting in the backend network of the Source.	1	1
Ad-hoc	Destination Address	Source address	BSSID	N/A	0	0

Following are few examples. You can verify them using any sniffer like Wireshark, Airopeek etc.

Setup1:
Assume we have a setup like STA<---->AP<---->PC.

In this setup AP is a bridge and STA is associated to the AP. PC is connected to AP with Ethernet interface. We will focus only on the communication between AP and Station.

A packet originated at PC an destined to STA, will have address1 filled with STA's MAC address, address 2 filled with BSSID(In general AP's MAC address) and address3 filled with PC's MAC address.

Setup2:
Assume we have a setup like STA<---->AP<---->PC.

In this setup AP is a Router and STA is associated to the AP. PC is connected to AP with Ethernet interface. Assume AP is the default gateway router for PC to reach STA. We will focus only on the communication between AP and Station.

A packet originated at PC an destined to STA, will have address1 filled with STA's MAC address, address 2 filled with BSSID(In general AP's MAC address) and address3 filled AP's MAC address.

Setup3:
Assume we have a setup like STA<---->AP<---->PC.

In this setup AP is a bridge and STA is associated to the AP. PC is connected to AP with Ethernet interface. We will focus only on the communication between AP and Station.

A packet originated at STA an destined to PC, will have address1 filled with BSSID(In general AP's MAC address), address 2 filled with STA's MAC address, and address3 filled with PC's MAC address.

Wednesday, 2 October 2013

Network Driver: Rx Descriptors: Ath9k as reference

DMAing is one of the most important parts to understand for any network driver developer. In my last post I have explored the descriptor details in Tx path. In this post, we will try to explore few details of descriptors in Rx path. We will use ath9k as the reference driver. This should be similar for ath5k, madwifi or any other Atheros driver including Fusion and Aquila. Handling of Rx descriptors is slightly different from that of Tx descriptors.

Ath9K: EDMA and Non-EDMA

In Ath9k there are two paths for reception of frames from the hardware. The first one is the case where EDMA (Enhanced DMA) is supported and the other is where it is not supported. There are a couple of differences between the two. In case of EDMA, two queues are maintained by the hardware namely high priority queue and low priority queue. The second difference is in the way the memory is allocated for descriptors and buffers (skb). In case of non-EDMA, the allocation is more like that in the tx path. Rx descriptors are allocated followed by the allocation of buffers(skb). Both of them are DMA synced and the physical address of the buffer is populated in the corresponding descriptor. The following figures shows allocation of descriptors and their linkage with the frame buffers.

Rx descriptor allocation:

Rx Descriptors are generally allocated at the initialization of the driver. In Ath9k driver, please refer the function, ath_rx_init . There are two separate paths, one is for EDMA and the other is for Non-EDMA.

EDMA path allocations are handled in ath_rx_edma_init . In this function, you can see that the buffers are allocated using ath_rxbuf_alloc and dma-mapped using dma_map_single. Please observe that the size of the buffer is equal to rx_bufsize (the sum of rx status length and max buffer size).

Non-EDMA path allocations are handled in ath_rx_init itself. Its allocation is similar to that of Tx Descriptors.

Populate and pass descriptor to the hardware:

Once the descriptors are allocated, the next step is to pass them to the hardware. Generally it is not done as part of the initialization. Rather it is delayed further till reception of packets is enabled. Reception of packets generally is enabled when atleast one of the VAPs (created on top the device) is enabled. In Ath9k/mac80211 combination, this invocation path is ieee80211_do_open --> drv_start (ath9k_start) --> ath_complete_reset --> ath_startrecv.

Also, observe that calls are made to ath_rx_edma_buf_link (EDMA path, ath_edma_start_recv --> ath_rx_addbuffer_edma --> ath_rx_edma_buf_link ) and ath_rx_buf_link (Non-EDMA path) from ath_startrecv. In ath_rx_edma_buf_link and ath_rx_buf_link the descriptor's physical address is passed to the hardware.

Please observe in Non-EDMA path (ath_rx_buf_link)that, the physical address of the frame buffer is populated to the corresponding pointer of the descriptor (ds->ds_data = bf->bf_buf_addr;). This is not needed in EDMA path. Please refer the figures shown above.

Processing of frames:

When a frame is received, hardware populates the next available descriptor and raises an interrupt. Driver handles the interrupt and actual processing of the frames is done in a tasklet . The driver extracts the frame, dma-unmap it, processes it and sends it to the top layers. Once the frame is sent to the top layer, it also has to allocate a new frame buffer, dma-map it and pass to the hardware. Because it has taken out one frame buffer and passed it to the upper layers. The upper layers will free that frame once they are done with their processing.

In Ath9k driver, corresponding tasklet function is ath_rx_tasklet. Frames are extracted in ath_get_next_rx_buf and ath_edma_get_next_rx_buf functions respectively in Non-EDMA and EDMA paths respectively. Please also observe the creation of new frame buffer using ath_rxbuf_alloc and the new buffer is passed to the hardware using ath_rx_buf_link / ath_rx_edma_buf_link.

Friday, 27 September 2013

Network driver descriptors: ath9k as reference

DMAing is one of the most important parts to understand for any network driver developer. In this post, we will try to explore few details of DMA in Tx path. We will use ath9k as the reference driver. This should be similar for ath5k, madwifi or any other Atheros driver including Fusion and Aquila.

Before going into the details, let me explore the components involved. Mainly we have the host (on which the network driver is running) and the MAC (part of firmware in the wireless card). MAC consists of different sub components including Queue Control Unit (QCU) and DCF Control Unit (DCU). In Tx path the frame transmission begins at QCU and later passed to corresponding DCU for transmission into air. There exists one QCU for each category. Different categories include, Best Effort, Background, Video, Voice, Beacon, UAPSD and CAB.

MAC is responsible for transferring frames between host memory and card. All the transfers happen using a structure called descriptor. Host creates, populates and provides the descriptors to MAC for further processing by MAC. Please note that the MAC needs the physical addresses and not the virtual addresses. Hence the host need to map the virtual addresses of the descriptor before passing it to the MAC.

Descriptor:

Descriptors are more like device specific. While passing the descriptors to the MAC, they are maintained as linked list. So descriptor structure should contain a pointer to the next descriptor. MAC processes the list of descriptors in the order and raises an interrupt (TXEOL) at the end of the list.

Also, the tx descriptor should contain the physical address of the actual buffer to be transmitted.

Descriptors not only useful for passing the frame to the MAC, but also for fetching the tx and rx status after the transmission and reception consecutively. However this information is fetched through a call to the HAL layer. For example, please see the definition of ath_tx_edma_tasklet in ath9k driver. This information is fetched from HAL layer using the function, ath9k_hw_txprocdesc.

As an example descriptor, please see below the descriptor used for Atheros based cards. It contains 24 32-bit words. This definition is from ath9k driver. From the definition we can see that we can specify multiple buffers in a single descriptor. However as of now, the current implementation passes only one buffer.

/* Transmit Control Descriptor */
struct ar9003_txc {
        u32 info;   /* descriptor information */
        u32 link;   /* link pointer */
        u32 data0; /* data pointer to 1st buffer */
        u32 ctl3;   /* DMA control 3 */
        u32 data1; /* data pointer to 2nd buffer */
        u32 ctl5;   /* DMA control 5 */
        u32 data2; /* data pointer to 3rd buffer */
        u32 ctl7;   /* DMA control 7 */
        u32 data3; /* data pointer to 4th buffer */
        u32 ctl9;   /* DMA control 9 */
        u32 ctl10; /* DMA control 10 */
        u32 ctl11; /* DMA control 11 */
        u32 ctl12; /* DMA control 12 */
        u32 ctl13; /* DMA control 13 */
        u32 ctl14; /* DMA control 14 */
        u32 ctl15; /* DMA control 15 */
        u32 ctl16; /* DMA control 16 */
        u32 ctl17; /* DMA control 17 */
        u32 ctl18; /* DMA control 18 */
        u32 ctl19; /* DMA control 19 */
        u32 ctl20; /* DMA control 20 */
        u32 ctl21; /* DMA control 21 */
        u32 ctl22; /* DMA control 22 */
        u32 ctl23; /* DMA control 23 */
        u32 pad[8]; /* pad to cache line (128 bytes/32 dwords) */
} __packed __aligned(4);

In ath9k and other Atheros based drivers , between the actual frame (skb) and descriptor, there is an abstraction called, "struct ath_buf". This is an intermediary structure holding the important data that should be accessed before and after transmission. This structure encapsulates data like physical and virtual addresses of the descriptor, physical address of the frame (skb->data) and details like pointer to station information.

Creation of Descriptors:

Descriptors are created once and should be consistent between host and device access. Hence it is better to use consistent (coherent) DMA mapping. Corresponding calls are pci_alloc_consistent, dma_alloc_coherent and dmam_alloc_coherent.

As an example, please refer the definition of the function ath_descdma_setup in any of the Atheros based drivers. In ath9k, memory is allocated using dmam_alloc_coherent. In some other drivers the memory is allocated using pci_alloc_consistent. Please observe that multiple number of descriptors are allocated.

Populate the descriptor:

When a frame is received by the driver, its physical address (after dma mapping) should be populated into a descriptor and the address of the descriptor should be passed to the hardware.

Please see the definition of the function ath_tx_setup_buffer in ath9k driver. First an ath_buf is dequeued from the list of free buffers and the frame (skb) is dma mapped using the function call, dma_map_single.

Please note that we are using dma_map_single for skbs and dmam_alloc_coherent for descriptors. dma_map_single is streaming DMA routine. Generally once the skb is mapped, host does not modify any of its contents (Except in some special cases like UAPSD). Hence streaming DMA is fine for mapping the skbs.

Please also observe that the physical address is saved in one of the fields (bf_buf_addr) of ath_buf which will be used later. Actual values of the descriptor are populated in HAL related functions. One such function is ar9003_set_txdesc. This function is invoked as a function pointer from ath9k_hw_set_txdesc which is invoked from ath_tx_fill_desc (in some drivers the corresponding function might be ath_hal_filltxdesc) which in turn is invoked from ath_tx_send_normal. You can see that the physical address of the frame (bf_buf_addr) is used here and populated into the corresponding field in the descriptor.

Pass the descriptor to the hardware:

Once the descriptor is populated, it is passed to the hardware. In ath9k or other Atheros drivers, corresponding function is ath_tx_txqaddbuf. There are two different paths here.

In case of enhanced DMA, the descriptor is directly given to the corresponding queue. Corresponding function call in ath9k is ath9k_hw_puttxbuf. Please observe that the queue number and physical address of the descriptor (bf->bf_daddr) are passed to this function.

If the enhanced DMA is not supported and if the queue is empty then the descriptor is directly passed to the corresponding queue. In case the queue is not empty, the descriptor is appended to the queue. In ath9k driver, you can see the invocation of ath9k_hw_set_desc_link to append the frames to the tx queue. The frames in the queue are processed in FIFO order.

Friday, 20 September 2013

WiFi: Layer-3 fragmentation vs 802.11 Fragmentation

We all know that fragmentation of packets happens at layer-3. 802.11 standard also specifies fragmentation of frames. Generally 802.11 fragmentation happens at network driver level. However there are few other trivia worth of discussing.

Threshold:
While layer-3 decides to fragment packets based on MTU (Maximum Transmission Unit), in wireless (802.11/WiFi) the decision is taken based on a constant called, "Fragmentation Threshold". This fragmentation threshold is generally configurable.

Command to set Threshold
On Linux, for layer-3, MTU can be configured by ifconfig and 802.11 fragmentation threshold can be configured using iwpriv.

Sequence Number
Please note that we refer to the sequence number from 802.11 header in the following.

When packets are fragmented at layer-3, each fragment is given as a separate packet to the wireless network driver and they are considered as different frames. Each frame gets a different sequence number.

In the case of 802.11 fragmentation, the frames are fragmented by the network driver and each fragment gets the same sequence number with different fragment number.

Thursday, 12 September 2013

Specifying the list of users to access GUI in OpenWRT

LuCI is one of the most used GUI packages on OpenWRT. When you access the device using LuCI based packages, you will be prompted to provide the username and password. By default it allows only the user "root".If you are not comfortable in using root based login, you can specify new user by modifying the LuCI package.

For adding more users or removing the root user, you need to edit the file modules/admin-mini/luasrc/controller/mini/index.lua in the luci package. Open that file and look for "page.sysauth =" (without quotes). By default you will see the line with root as the user ie page.sysauth = "root". If you want specify any other username (say admin) instead of root, change this line to page.sysauth = "admin". Multiple usernames also can be specified with each username is separated by comma and the list is enclosed with { and }. As an example root and admin are specified as page.sysauth = {"root", "admin"}

You can use adduser command to add a new user.

Friday, 6 September 2013

Local source for LUCI on OpenWRT

OpenWRT is the most popular development platform for Wireless routers. The first thing that one can find for GUI based configuration is LUCI package. Integration of LUCI into your OpenWRT workspace is 3-step process.

Download
Install
Build

OpenWRT provides a convenient way to perform the above 3 steps easily.

To download and install, you need to specify the source and execute the download and install scripts. You can specify the source in feeds.conf.default. Open the file and check if there already exists a line which contains ""src-svn luci". If one exists, check whether the source repository is valid. As of this writing, valid source repository is http://svn.luci.subsignal.org/luci/trunk. You can specify this repository in your file.

Now that specifying the sources is done, execute the commands, "./scripts/feeds update luci; ./scripts/feeds install -a -p luci". This will download the LUCI sources and install them into your workspace.

To integrate this installed package in your binaries, you need to build again(Issue make) .

Maintaining a local copy

Once you get the LUCI source, there are chances that you don't want to download it again and instead want to make changes to the existing repository and use it.You can use the modified repository as your IP too :)

When you download LUCI initially, it is downloaded into the directory called feeds. You can use it as a local repository. Following is one of the clean ways of doing it.

Create a directory for the local repository, say nearhop in your base OpenWRT directory (mkdir nearhop).
Copy feeds/luci into the direcotry, nearhop (cp -r feeds/luci nearhop).
In feeds.conf.default, change luci source to your local repository. ie comment the line containing "src-svn luci" (Just place a # at the beginning of this line.
In the same file, add the line "src-cpy luci nearhop/luci" and save.

Now you can customize your GUI, by editing the files in the local repository (nearhop/luci in our case). Whenever you modify/add/delete any of the files, you need to install the modified package and build it using the commands, ./scripts/feeds update luci

./scripts/feeds install -a -p luci

make

Thats it. Enjoy your customized GUI.

Thursday, 29 August 2013

Locking tips while developing network driver

Wireless Network driver is the most important part of any Access Point/Router. Some of the famous drivers are ath9k, ath5k, madwifi etc. One of the most important thing that one should understand while developing/modifying drivers is Locking mechanism.

Locking must be used to avoid race conditions. However if not used properly, it will lead to deadlock or crashes. The type of lock that one need to use mainly depends on the the execution context which will try to acquire the lock. Linux offers several types of locks to handle different scenarios. Here in this post I will cover few of them which are useful (or I have seen) in Network driver context. The following discussion assumes SMP architecture.

Locks in Linux

Locks offered by Linux can be broadly divided into two categories. They are -

Spinlock
Semaphore

Both the locks ensure that "only one process enters the critical section at any time". However with spinlock, a process has to wait (busy wait) in a loop until the lock is available. With semaphore, the process goes into sleep until the lock is available. With Spinklock CPU cycles will be wasted as the process is kept busy and with Semaphore there is a chance that the process may sleep if the lock is not available immediately. Hence it is a good approach to use spinlocks where the lock holding time is small or where sleeping is not allowed and use semaphore if sleeping is not a sin.

Execution Contexts and Locks

In Linux kernel, at driver level, mainly there are three types of execution contexts. They are -

Interrupt context
Bottom half context (SoftIRQ, Tasklet)
Kernel/User Thread Context (workqueue, idle, IOCTL)

Interrupt handlers are executed in Interrupt context. One basic principle is that the interrupt handler should complete the execution quickly and operations including sleeping, interaction with user space for data transfer or invoking scheduler should be avoided. Now coming back to our main topic, what kind of lock is best suited in interrupt context? Since interrupt handlers should not sleep, we should use Spinlocks in Interrupt context. In Linux, spin_lock, spin_lock_irq and spin_lock_irq_save can be used in interrupt context. The difference between spin_lock and spink_lock_irq is that the latter disables interrupts on the local processor. And spin_lock_irq_save saves current interrupt state along with disabling interrupts.

Bottom halves are like Interrupts. Hence all the restrictions mentioned above for interrupts are applicable to bottom halves also. Hence in bottom-half context also, Spinlocks should be used. In Linux, spin_lock and spin_lock_bh can be used in bottom-half context. The difference between spin_lock and spink_lock_bh is that the latter disables bottom-halves.

For Kernel/User Thread context, it is upto the requirement whether to go for semaphores or spinlocks. In general, if the lock holding time is less than the context switch time overhead, go with spinlocks. Otherwise go with semaphores.

Linux also offers more sophisticated locks like read-write semaphores and read-write spinlocks. For example, rwlock_t is a spinlock, which can be acquired in read or write modes. Any number of processes holding the read lock can enter the critical section. However write access is exclusive.

Mixed contexts

In the above section we discussed about locking under a particular execution context. Now it is time to enumerate the locks to be used when the critical section could be entered under different paths each with same or different execution contexts. I will try put these in the following table. Please note that, at any place you can use read-write variants (example rwlock_t) of the corresponding lock. It purely depends on the underlying scenario. However the following choices shall work fine.

Context x Context	Interrupt Context	Bottom Half Context	Thread Context
Interrupt Context	spin_lock_irq_save	spin_lock_irq_save In this scenario you may use spin_lock in interrupt context code but not in bottom-half context code (unless interrupts are disabled by some other way)	spin_lock_irq_save In this scenario you may use spin_lock in interrupt context code but not in Thread context code (unless interrupts are disabled by some other way)
Bottom Half Context	spin_lock_irq_save In this scenario you may use spin_lock in interrupt context code but not in bottom-half context code (unless interrupts are disabled by some other way).	spin_lock_bh	spin_lock_bh In this scenario you may use spin_lock in bottom-half context code but not in Thread context code.
Thread Context	spin_lock_irq_save In this scenario you may use spin_lock in interrupt context code but not in Thread context code (unless interrupts are disabled by some other way).	spin_lock_bh In this scenario you may use spin_lock in bottom-half context code but not in Thread context code.	spin_lock or semaphore. Purely depends on the scenario.

Some trivia

Need to remember the following points while designing locking solution.

1. When you acquire multiple locks, always unlock them in the reverse order of locking. Otherwise it may lead to deadlock

2. If the interrupt handler is re-entrant, before doing anything else, always disable the interrupt corresponding to the interrupt handler and enable it only at the end.

3. Never try to acquire a semaphore while holding a spinlock. That will lead to deadlocks.

Tuesday, 20 August 2013

Send SMS using 3G USB dongle on Linux

Before doing anything else, please make sure that your device is detected on your Linux machine. You can check this by issuing the command "dmesg". After you plug-in the 3G card, the last few lines of the "dmesg" command should show messages related to this. For eg. on my machine I see the following messages.

[19447.425468] option 2-1.3:1.0: GSM modem (1-port) converter detected
[19447.426391] usb 2-1.3: GSM modem (1-port) converter now attached to ttyUSB0
[19447.426423] option 2-1.3:1.2: GSM modem (1-port) converter detected
[19447.426848] usb 2-1.3: GSM modem (1-port) converter now attached to ttyUSB1
[19447.426867] option 2-1.3:1.3: GSM modem (1-port) converter detected
[19447.427332] usb 2-1.3: GSM modem (1-port) converter now attached to ttyUSB2

Also, please make sure that usb_modeswitch is installed. usb_modeswitch is needed to make sure that your device is detected as a networking device instead of a storage device.

Script:

The following script will help you to send an SMS using a 3G dongle on a Linux box.

#!/bin/sh
chat TIMEOUT 1 "" "AT+CMGF=1" "OK" > /dev/ttyUSB2 < /dev/ttyUSB2
chat TIMEOUT 1 "" "AT+CMGS=\"$1\"" "OK" > /dev/ttyUSB2 < /dev/ttyUSB2
chat TIMEOUT 1 "" "$2" "OK" > /dev/ttyUSB2 < /dev/ttyUSB2
chat TIMEOUT 1 "" "^Z" "OK" > /dev/ttyUSB2 < /dev/ttyUSB2
chat TIMEOUT 1 "" "AT+CMGF=0" "OK" > /dev/ttyUSB2 < /dev/ttyUSB2

How to use:

Please note that you need to execute the script with root permissions.

1. Copy the above script into a file. Assume the name of the file is sms.sh.
2. Add execution permissions to the script ie "chmod +x sms.sh"
3. To send the SMS, execute the script with the mobile number as the first argument and the text message as the second argument. If your message consists of multiple words, use quotes.

For eg. The following command sends the message "hello boss" to the mobile number +919876543210.

$ ./sms.sh +919876543210 "hello boss"

Personally I have tested this on OpenWRT based Ralink board and it worked well for me.