Linux Network Troubleshooting
Kyle Rankin
Systems Architect
QuinStreet Inc.
Agenda
- Introduction
- Network Troubleshooting
- Client or Server Problem?
- Is it Plugged In?
- Test Local Network
- DNS Troubleshooting
- Test Your Route
- Test Remote Ports
- Test For Listening Ports
- Questions
Introduction
- Troubleshooting is a skill
- Not everyone naturally has this skill
- Everyone can be a better troubleshooter
- You also want to be a faster troubleshooter
General Troubleshooting Philosophies
- Divide the problem space
- Favor quick, simple tests over slow, complex tests
- Favor past solutions
- Good communication is critical when collaborating
- Understand how systems work
- Document your problems and solutions
- What changed?
- Use the Internet, but carefully
- Resist rebooting
Network Troubleshooting
- Network and local problems often have similar symptoms
- Most common network issue: "The Server/Internet is Down"
- Really: "I can't communicate with the Server/Internet"
- Our problem: Client A (10.1.1.7) can't access the Web service (port 80) on Server B (10.1.2.5)
- Many problems have similar symptoms
- Normally, I'd skip steps...
Client or Server Problem?
- Quick test: test from another machine
- Ideally on same network as client
- Same problem? Network or Server issue
- No problem? Client issue
Is it Plugged In?
- Most embarrassing error
- More common than you'd think
- Rule it out, just in case
- Either physically inspect the cable, or run ethtool:
Ethtool
$ sudo ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pg
Wake-on: d
Current message level: 0x000000ff (255)
Link detected: yes
Ethtool
$ sudo ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pg
Wake-on: d
Current message level: 0x000000ff (255)
Link detected: yes
Possible Solutions
- Plug it in!
- Get new cables
- Are you sure eth0 is port 0?
- Fix duplex issues:
$ sudo ethtool -s eth0 autoneg off duplex full
Is the Interface Up?
$ sudo ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:17:42:1f:18:be
inet addr:10.1.1.7 Bcast:10.1.1.255 Mask:255.255.255.0
inet6 addr: fe80::217:42ff:fe1f:18be/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:1 errors:0 dropped:0 overruns:0 frame:0
TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:229 (229.0 B) TX bytes:2178 (2.1 KB)
Interrupt:10
Is the Interface Up?
$ sudo ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:17:42:1f:18:be
inet addr:10.1.1.7 Bcast:10.1.1.255 Mask:255.255.255.0
inet6 addr: fe80::217:42ff:fe1f:18be/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:1 errors:0 dropped:0 overruns:0 frame:0
TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:229 (229.0 B) TX bytes:2178 (2.1 KB)
Interrupt:10
Possible Solutions
- Start network service
- service networking start
- ifup eth0
- Fix network settings
- Troubleshoot DHCP server
Test Local Network
- Can you ping other hosts on the local network?
- When in doubt, try your gateway:
$ sudo route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.1.1.0 * 255.255.255.0 U 0 0 0 eth0
default 10.1.1.1 0.0.0.0 UG 100 0 0 eth0
Use ping to test the gateway:
$ ping -c 5 10.1.1.1
PING 10.1.1.1 (10.1.1.1) 56(84) bytes of data.
64 bytes from 10.1.1.1: icmp_seq=1 ttl=64 time=3.13 ms
64 bytes from 10.1.1.1: icmp_seq=2 ttl=64 time=1.43 ms
64 bytes from 10.1.1.1: icmp_seq=3 ttl=64 time=1.79 ms
64 bytes from 10.1.1.1: icmp_seq=5 ttl=64 time=1.50 ms
--- 10.1.1.1 ping statistics ---
5 packets transmitted, 4 received, 20% packet loss, time 4020ms
rtt min/avg/max/mdev = 1.436/1.966/3.132/0.686 ms
Possible Solutions
- Network Administrator is blocking ICMP on gateway (annoying)
- If so, ping other hosts
- or convince your admin ICMP is useful
- If ICMP not blocked, possible VLAN problem
DNS Troubleshooting
- It helps to know how DNS works first...
- nslookup main troubleshooting tool
- Successful query:
$ nslookup web1
Server: 10.1.1.3
Address: 10.1.1.3#53
Name: web1.example.net
Address: 10.1.2.5
Don't confuse DNS server IP with host IP
Bad DNS Queries
server can't find host: NXDOMAIN
$ nslookup web1
Server: 10.1.1.3
Address: 10.1.1.3#53
** server can't find web1: NXDOMAIN
DNS server works, but can't find web1
Could be a DNS search path issue. Check /etc/resolv.conf:
search example.net
nameserver 10.1.1.3
- web1.example.net works, web1.dev.example.net doesn't
- Solution: use FQDN or add search path
If FQDN doesn't resolve, likely DNS server config problem
- If authoritative for domain, check zone config
- If recursive DNS server, confirm recursion enabled
- Test other domains
Bad DNS Queries
No servers could be reached
$ nslookup web1
;; connection timed out; no servers could be reached
Possible causes:
- No name servers configured for your host
- Name servers configured, but inaccessible
Name servers defined in resolv.conf:
search example.net
nameserver 10.1.1.3
If no name servers configured, add one!
If name server listed by its hostname, not IP... think about that for a second
Bad DNS Queries
No servers could be reached
- If name servers configured, but can't be reached, test connection to them
- Start with ping
- No ping, same subnet: DNS server could be down
- No ping, different subnet: test route to DNS server
- Ping works, DNS server not responding, test remote DNS port
Test Your Route
- Can packets from your host reach the remote host?
- Ping remote host:
- Ping works, move to next step
- Ping doesn't work, ping another host on same network
- Ping still doesn't work, use traceroute.
- Successful output:
$ traceroute 10.1.2.5
traceroute to 10.1.2.5 (10.1.2.5), 30 hops max, 40 byte packets
1 10.1.1.1 (10.1.1.1) 5.432 ms 5.206 ms 5.472 ms
2 web1 (10.1.2.5) 8.039 ms 8.348 ms 8.643 ms
Good Routes Gone Bad
Traceroute with asterisks
$ traceroute 10.1.2.5
traceroute to 10.1.2.5 (10.1.2.5), 30 hops max, 40 byte packets
1 10.1.1.1 (10.1.1.1) 5.432 ms 5.206 ms 5.472 ms
2 * * *
3 * * *
Last IP to respond, first IP to check
In this case, check 10.1.1.1
Good Routes Gone Bad
Traceroute timeouts
$ traceroute 10.1.2.5
traceroute to 10.1.2.5 (10.1.2.5), 30 hops max, 40 byte packets
1 10.1.1.1 (10.1.1.1) 5.432 ms 5.206 ms 5.472 ms
1 10.1.1.1 (10.1.1.1) 3006.477 ms !H 3006.779 ms !H 3007.072 ms
Ping timed out at the gateway (10.1.1.1)
Host likely down
Possibly even from own subnet
Or, Network admin blocked ICMP (grrr!)
If so, use tcptraceroute instead (yay)
Test Remote Ports
- Server responds to ping, DNS works
- Test whether remote port is open
- For DNS, port 53, for web1, port 80:
$ telnet 10.1.2.5 80
Trying 10.1.2.5...
telnet: Unable to connect to remote host: Connection refused
If able to connect, probably not a networking problem
If connection refused, either service down (fix the server) or firewalled off
Test Remote Ports
Test for firewalls
- Many firewalls drop packets to the floor
- Most real closed ports don't
- Nmap can tell the difference:
$ nmap -p 80 10.1.2.5
Starting Nmap 4.62 ( http://nmap.org ) at 2009-02-05 18:49 PST
Interesting ports on web1 (10.1.2.5):
PORT STATE SERVICE
80/tcp filtered http
Test For Listening Ports
- If no firewall in place, test for listening port on remote server
- Many ways to do it (telnet, nmap, nc) but I like netstat:
$ sudo netstat -lnp | grep :80
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 919/apache
No output, no service listening on that port. Start the service
Make sure listening on correct interface