[Scamper-dev] scamper notes draft 2

Bradley Huffaker bhuffake at caida.org
Wed May 19 18:32:08 PDT 2004


I will try and get the data time slot next week to have another meeting.
The topics as I currently see them are:

    # Review Data Object fields
    # Design List Manager Specs

##############################################################
Wed May 19 18:32:46 PDT 2004
written by Bradley Huffaker
##############################################################

--------------------------------------------------------------------
    Goals 
--------------------------------------------------------------------
    
    - Collect IP topology and RTT values for both IPv4 and IPv6	
      address space.

    - Probing of the network will be as unintrusive as possible.
      This will be understood to mean sending as few packets on
      both prefix and individual IP address level and both in aggregate 
      and with in a given time period.
    
    - To allow different lists of destinations to be probed from
      all monitors simultaniously

    - Lists will be understood to run in cycles.  Where a cycle
      is the smallest logical unit (above the individual IP address)
      of a list.  In most normal cases this will be understood to mean
      a complete run through all the IP addreses in the list.

    - Data to be stored with each file being a unique cycle and list.
      Files will be stored in a directory structor of:
	    <list>/<monitor>/<year>/<mon>/
      Files will be named as:
	    <list>.<monitor>.<year><mon><day>.<cycleId>.warts
      Where <year>, <mon>, and <day> are from the cycleId
      which is assumed to be the timestamp of when the List 
      Manager first started the cycle.

--------------------------------------------------------------------
    Codeing requirements
--------------------------------------------------------------------
    
    - scamper must be able to interleave and prob IP address from
      different lists.  Lists can be both overlaping and disjoint 
      sets of IPs.  Lists have different requirements on how to
      feedback from the network.
    
    - To avoid excessive probbing of a single address scamper must
      not have more then one instance of a given IP address in
      it's actively proving set of IPs.

--------------------------------------------------------------------
    Scamper Probing and List Management
--------------------------------------------------------------------

The role of list managment and list probing will be divided 
into to two processes.  Scamper will be the probing process and there
will also be a List Manager to handle the list creation and
interleaving of different lists.

Two processes is the most likely way to do it, but other possibilities
include two threads, or just two sets of functions/classes.  In any case,
the communication between them is limited to a one-way (List Manager to
Prober) stream of commands.  In the two-process implementation, the best
communication channel is probably a named pipe.

--------------------------------------------------------------------
    Scamper Data Objects
--------------------------------------------------------------------

Start Cycle Object -    This marks the beginning of the cycle and lists
                        the set of options which will be used for the
                        comming cycle.  These options can include: collect
                        hop RTT, number of tries for a non responsive
                        hope,or to use source routing.

End Cycle Object -      This will mark the end of a cycle and will be used 
                        by archive/sorter to close off the filehandler 
			for this cycle.


Ip Path Object -	This holds the collection of information which
			is stored when IP path discovery is preformed for a 
			destination with a fixed set of Source Routed IPs.

MTU Path Object -	This will hold the collection of information which
			is stored when MTU discovery is preformed for a 
			destination with a fixed set of Source Routed IPs.

--------------------------------------------------------------------
    List Management Commands
--------------------------------------------------------------------

The List manager may send an IP Command, Begin List Command, or End List
Command:  

    Exit Command - scamper will finish the current tasks it is processing
	and then terminate.  No futher commands will be read.

    Ip Command - contains destinations to prob 
	 contains:IP, ListId, and CycleId.  
    Begin List Cycle Command  - marks the beginning of the current cycle and 
	sets up the options that will be used to prob the IP address
	contains:ListId, CycleId, list of Options for use this cycle, and 
	    human readable text field.  
    End List Cycle Command - marks the end of a list and causes scamper to
	free up resources assigned to it once it has finished the
	current cycle.
	contains: ListId

--------------------------------------------------------------------
    Scamper General Structor
--------------------------------------------------------------------

Scamper will have the following logical parts:

    Active Window - list of IP addresses which are currently being probed

    Command Queue - queue of commands which were passed by the List Manager
	but have yet to be processed

    Hold List - list of IP Commands which are on hold because their 
	IP address is currently in in the Active Window or the Hold List.

    IP Prob State - collection of information which keeps track 
	of the current state of probing and the data which will 
	finally be stored to disk.

--------------------------------------------------------------------
    Scamper Gaol
--------------------------------------------------------------------

    * Only one occurance of the same IP addresses will be found in
      the active list.

    * IP addresses which were on hold should be processed before
      other IP addreses in the Command Queue and in the order they
      were orginally found in the Command Queue as much as possible.

    * IP addreses should remain in the Active Window for a reasonable
      period of time after they have finished processing to allow 
      for unusual slow return paths in the network.

--------------------------------------------------------------------
    Scamper General Flow of Control
--------------------------------------------------------------------

Scamper reads commands from the List Manager and stores them into it's
Command Queue.  It reads commands off the Command Queue in the order that
they were passed from the List Manager as long as there exists commands in
the Command Queue or there are free spaces in the Active Window.  If they
are List Cycle Commands it will update the List Cycle's state.  If they are
IP Commands they either create a IP Prob State in the Active Window or are
moved to the Hold List.

Processing List Cycle commands will effect the set of active List Cycle
pairs.  Scamper will maintain state for all the currently active ListId and
CycleId pairs.  A pair is actived after a Begin List Cycle Command has been
processed.  It is considered inactive after processing an End List Cycle
Command and there exist no IP Prob States for this ListId and CycleId in
the Active Window.  Once a ListId and CycleId become inactive scamper will
remove there state from it's memory.

Processing a IP Command will result in the Command being put on hold or the
creation of a IP Probe State.  When a IP Command is processed a search is
made of the Active Window or the Hold list for an IP addresses matching the
IP address in the IP Command.  If a match no match is found the an IP Prob
State is added to the Active Window.  If a match is found then IP Command
is copied to the Hold List with a Hold Timestamp which represents the
earliest period at which this IP Command can be reprocessed.

The Hold Timestamp will be the current time, or the Hold Timestamp of the
largest Hold Timetamp in the Hold List for the same IP, plus the Hold
Period, were the Hold Period will be some some period judged to be much
longer then the time to process the current IP plus a rest period.  IP
Commands are sorted onto the Hold List in order of thier Hold Timestamp
with the smallest value first.  This sort method used should have the
smallest possible overhead since the Hold List is assumed to be small.
Insertion sort would be a good choice.

Once an IP address has an IP Prob State inside of the Active Window it is
considered active.  Only active IP addresse may recieve network packets.
All other network packets are ignored.  Scamper will handle the in coming
networks packets for a given IP Prob State based on the options and state
stored in the IP Prob State and the options in it's corrisponding ListId
and CycleId pair.

An IP Prob State stays in the Active Window until it recieves all expected
responses, OR it times out.  (Perhaps we should wait for some (shorter)
timeout after all expected responses, in case there are unexpected
responses?)  When it is removed, its state record is written out, and it is
forgotten completely; any response that arrives after that are ignored.
Data can be written as either, or both, a IP Path Object or a MTU Object.
Timing out can easily be handled by a queue, ordered by time; periodically
remove timed-out items from the head of this queue.

When an IP Prob State is removed from the Active Window scamper will first
check the first command in the Hold List to see if it's Hold Timestamp has
expired and process the first command in the Hold List.  If the Hold List
is empty or the first entry's Hold Timestamp has not yet expired it will
process the next command in the Command Queue.  If the Command Queue is
empty and the Hold List is not scamper will sleep up to the Hold Timestamp.
If the Command Queue and the Hold List are empty and there are no futher
commands from the List Manager scamper will sleep for a short period of
time before rechecking the Command Queue, say 5 minutes.  It will repeatly
sleep and check until it is killed or the Command Queue is refilled.

If Scamper processes an Exit Command it will finish all IP Prob States in
the Active Window and process all IP commands in the Hold List and then
terminate.  It will not process any remaining commands in the Command Queue
and it will recieve no more commands from the List Manager.  We might want
it to save it's current active lists so that it might pick up were it left
off.

--------------------------------------------------------------------
    skDataD and Scamper
--------------------------------------------------------------------

The process of copying disk to a remove archive and the process for 
probing are seperate.  Scamper will be the probing process and skDataD
will be the data transfer service on the monitor.

Scamper will now continuesly write data to disk.  It will therefor not
periodically release the lock on it's output file.  Instead when skdatad
wishs to access scamper's output it will send scamper an interupt and start
trying to get it's own lock on the file.  When scamper recieves the
interupt it will finish what ever record it was writting, get a new lock on
a new file, release the lock on the old file, and start writting data to
the new file.  When scamper releases the lock on the old file skdatad will
be able to process the file at it's leisure.

--------------------------------------------------------------------
    Data Sorter
--------------------------------------------------------------------

The Data Sorter will use the Start Cycle Object to know when to open a 
new File Handler.  It will use the End Cycle Object to close the file 
Handler.



More information about the Scamper-dev mailing list