[Coral-dev] crl_to_pcap stats?
Peter Van Epp
vanepp at sfu.ca
Mon Jun 13 12:35:55 PDT 2005
On Mon, Jun 13, 2005 at 11:36:58AM -0700, Ken Keys wrote:
> If there is packet loss in an interval, crl apps print a message like this
> to the coral error file (stderr by default) at the end of the interval:
>
> warning: interval 1118687740.000000: iface 0: dropped 13 packets
>
> If there is no loss, no message is printed. If no intervals are set, the
> entire run is considered one interval for this statistics reporting.
>
It is perhaps worth pointing out that this is probably only one of the
(at least) three possible sources of packet loss. I expect this number is
reporting the packet error loss counter from libpcap which covers the copy from
mbufs in kernel space in to the libpcap buffer in user space. In addition you
can be losing packets (invisible to this counter) in the interface hardware
(from the physical wire in to the interface hardware / device driver before
the kernel buffers) or by exhausting kernel buffers at the mbuf level, running
out of CPU or running out of memory bandwith. Both of those two error sources
need to be detected at the kernel level (and I think some of the kernel buffer
level drops may not get counted or displayed although I haven't yet had time
to dig through the source code and see if thats true) and they vary by
operating system just to keep life interesting.
For example this is the interface error printout from a SysKonnect
fibre Gig card in a Suse 9.1 / Linux 2.6 kernel (this one without the ntop
libpcap ring buffer code that I usually use) machine that is being abused
at wire speed with 9K UDP packet bursts. As you see it isn't always happy
about this (and Intel copper Gig cards are even more unhappy, haven't yet tried
my brand new Intel fibre GigE cards :-)). I believe (without as I say having
looked at the Linux device driver code, although I have at the FreeBSD
equivelent) that this complaint means that the 64K on card ring buffer was
over written before the device driver serviced it (probably because of
insufficient CPU, the machine is a dual 1.2 Gig Athelon). That could be fixed
(since this is capture only) by changing the device driver to allocate more
memory to the receive side, or using a faster / more modern card (the
SysKonnect is also 4 years old) that does interrupt merging (which has its own
problems):
sniffer:~ # cat /proc/net/sk98lin/eth0
Detailed statistic for device eth0
=======================================
Board statistics
Active Port A
Preferred Port A
Bus speed (MHz) 66
Bus width (Bit) 64
Driver version 6.23
Hardware revision v1.2
Temperature (C) 27.05
Temperature (F) 81.00
Voltage PCI (V) 5.104
Voltage PCI-IO (V) 3.344
Voltage ASIC (V) 3.344
Voltage PMA (V) 3.278
Receive statistics
Received bytes 11761348565
Received packets 1710606
Receive errors 10
Receive dropped 0
Received multicast 431
Receive error types
length 0
buffer overflow 10
bad crc 0
framing 0
missed frames 0
too long 0
carrier extension 0
too short 0
symbol 0
LLC MAC size 0
carrier event 0
jabber 0
Transmit statistics
Transmited bytes 2047881323
Transmited packets 1083310
Transmit errors 0
Transmit dropped 0
Transmit collisions 0
Transmit error types
excessive collision 0
carrier 0
fifo underrun 0
heartbeat 0
window 0
As noted we beleive that we are seeing additional loss in kernel space
that we haven't found error counters for (if they are there) because this error
rate doesn't explain the amount of loss we are seeing at the application level
(although the application in our case isn't corel-reef, the principles are
unfortunatly universal).
Peter Van Epp / Operations and Technical Support
Simon Fraser University, Burnaby, B.C. Canada
More information about the Coral-dev
mailing list