Who Touched My Wifi?

I usually don't shut down my work computer but will need to reboot it every a few days after things getting sloww.. But last time after I rebooted it, my Wifi disappeared - it wasn't disabled, the menu to enable it just wasn't there.
Who touched my Wifi?
First, I recalled that my kids were typing on my keyboard when I was away having the coffee. So it must be my kid accidentally disabled my wifi by type some key combinations. It is all right, no a big deal. I can re-enable them, and I vaguely remembered that happened some time ago.


But, I could not re-enable it! There is no hardware switch or button, the Fn + (F1/F12) didn't work, but I got stuff to be done and wifi is not working! I was getting angry. "Did you touch my keyboard? My wifi is not working now!" I blamed the kid. He was silent and looked terrified.
After a quite while, I came to the conclusion that for my pc model there was no hardware switch or no key combinations that can be used to turn on the wifi and thus it must not be the kid turned off the wifi. Kid was sitting there, unhappy. I walked to him, squat down, said gently, "Sorry, Thomas, it wasn't you mess up my computer. It was Dad's fault and I can fix it. Don't worry". He was relieved, and almost about to cry. "I'm sorry. I can fix it. Don't worry..". I reassured him, "Do you want to have a nap now?" He nodded and went away to the room.
So, who the fxxx touched my Wifi?
I didn't have a systematic way find it out back then. I tried a few commands Googled and was looking for a quick answer. But there wasn't one. How could my wifi suddenly stop working after a reboot?
Let's first see how I figured out it was the wifi firmware issue and then see who updated my kernel and left me in the dark!

Wifi driver and firmware

check hw and found out the wifi card model

$ sudo lshw -C network | grep -A 10 Wireless
       description: Wireless interface
       product: Wireless 7260
       vendor: Intel Corporation
       physical id: 0
       bus info: pci@0000:04:00.0
       logical name: wlan0
       version: 6b
       serial: 28:b2:bd:60:43:8e
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress bus_master cap_list ethernet physical wireless
       configuration: broadcast=yes driver=iwlwifi driverversion=4.4.0-112-generic firmware=25.30.13.0 ip=192.168.1.35 latency=0 link=yes multicast=yes wireless=IEEE 802.11bgn
As a comparison, it will miss a few information when the driver is not working properly.
$ sudo lshw -C network | grep -A 10 Wireless
       product: Wireless 7260
       vendor: Intel Corporation
       physical id: 0
       bus info: pci@0000:04:00.0
       version: 6b
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress cap_list
       configuration: latency=0
       resources: memory:f2400000-f2401fff
From above I know that model is Intel 7260. If driver is installed properly, we know the driver is iwlwifi. If no driver is shown, we have to fix that first.

Is the wifi driver OK

There are several possible reasons the driver fails to install:
  1. we don't have the driver required, be it in the kernel or as a kernel module
  2. the stuff the driver needed is either wrong or not there - e.g firmware.
To know more status about the driver, check the dmesg.
$ dmesg | grep iwlwifi
[   18.098140] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-7260-17.ucode failed with error -2
[   18.098156] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-7260-16.ucode failed with error -2
[   18.098165] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-7260-15.ucode failed with error -2
[   18.098172] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-7260-14.ucode failed with error -2
[   18.098179] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-7260-13.ucode failed with error -2
[   18.098181] iwlwifi 0000:04:00.0: request for firmware file 'iwlwifi-7260-13.ucode' failed.
[   18.098183] iwlwifi 0000:04:00.0: no suitable firmware found!
Ha, we see the problem is related with the driver firmware - " no suitable firmware found!". modinfo has lots of stuff to tell, e.g firmware:
$ modinfo iwlwifi | grep firmware | grep 7260
firmware:       iwlwifi-7260-13.ucode
So, we just go ahead and find and install that firmware to /lib/firmware, and reboot.
$ dmesg | grep iwlwifi
[   17.954019] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-7260-17.ucode failed with error -2
[   17.954035] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-7260-16.ucode failed with error -2
[   17.954044] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-7260-15.ucode failed with error -2
[   17.954051] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-7260-14.ucode failed with error -2
[   18.143465] iwlwifi 0000:04:00.0: loaded firmware version 25.30.13.0 op_mode iwlmvm
[   18.410571] iwlwifi 0000:04:00.0: Detected Intel(R) Wireless N 7260, REV=0x144
[   18.410680] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
[   18.410971] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
[   24.090851] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
[   24.091116] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
[   24.282744] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
[   24.283007] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
shit, the error disappeared, and my wifi was great again!

Who touched my kernel?

So, who touched my wifi firmware?
Well, actually who touched my kernel, as I found out later.
$ uname -r
4.4.0-112-generic
Where is my gold old 3.13 kernel?!
Yeah, 3.13 kernel is old, has security holes, lacking of lots of cool features. But I was just not confident enough to update to the latest version when I have installed lots of lots of stuff for my daily work and they works perfectly so far..
But all of sudden, I got a new kernel! It was somewhat mixed feeling - somebody helped you make a decision you were a little bit hesitant to make. There was a second I thought, hmm... maybe I can take this opportunity to update to 16.04 and maybe all will be good? But I hesitated, again. As a serious technical person, I was more inclined to root cause a problem then to "try" something and hope it just works - I'm not sure if it is something I should proud of, though.
OK, enough the reason why I was using the ancient 3.13 kernel. The question I had was who updated my kernel without me knowing it.
I didn't compile and install the kernel manually so it must be installed as a package.
List the packages that are installed with "4.4.0-112-generic" in the name.
binchen@m:/lib/firmware$ aptitude search 4.4.0-112-generic  | grep "^i"
i A linux-headers-4.4.0-112-generic - Linux kernel headers for version 4.4.0 on 
i A linux-image-4.4.0-112-generic   - Linux kernel image for version 4.4.0 on 64
i A linux-image-extra-4.4.0-112-gen - Linux kernel extra modules for version 4.4
The Flag A indicates that those packages are automatically installed, i.e. the new kernel was not installed by me explicitly on the command line.
Checking the apt history made things all clear, the new 4.4 kernel was installed automatically when I tried to install snapd.
$ cat /var/log/apt/history.log | grep -B 2 4.4.0-112-generic
Start-Date: 2018-02-22  18:07:23
Commandline: apt-get install snapd
Install: linux-image-extra-4.4.0-112-generic:amd64 (4.4.0-112.135~14.04.1, automatic), linux-image-4.4.0-112-generic:amd64 (4.4.0-112.135~14.04.1, automatic), squashfs-tools:amd64 (4.2+20130409-2ubuntu0.14.04.2, automatic), linux-headers-4.4.0-112-generic:amd64 (4.4.0-112.135~14.04.1, automatic), linux-generic-lts-xenial:amd64 (4.4.0.112.96, automatic), thermald:amd64 (1.4.3-5~14.04.4, automatic), snapd:amd64 (2.29.4.2~14.04), linux-headers-generic-lts-xenial:amd64 (4.4.0.112.96, automatic), linux-headers-4.4.0-112:amd64 (4.4.0-112.135~14.04.1, automatic), systemd:amd64 (204-5ubuntu20.26, automatic), linux-image-generic-lts-xenial:amd64 (4.4.0.112.96, automatic)
The -B 2 option for grep will show 2 lines before the match, which is the apt-get command causing the install and the date for the command.
All clear now.
A new kernel was updated when installing a package using apt-get install; but the new kernel requires a new firmware that isn't installed on my system, so after the reboot, the new kernel get booted but the wifi driver failed to probe, and hence my wifi disappeared.

Summary

Here are a fews things I have learned.
  • Don't blame your kids, too quickly.
  • Don't trust your memory. Especially when getting old(and I hate to say that).
  • Pay a little bit more attention when doing apt-get install. Know what will be installed. And that is where container shines.
  • Finally, a few things/tools can be used to diagnose hw/driver problems.