[Dshield] awk scripts for DSHIELD format verfication, translation from 3Com OfficeConnect firewall formats to DSHIELD format

Bruce Lilly blilly at erols.com
Sun Jul 1 01:54:25 GMT 2001


The first awk script will complain about lines in the input
stream which don't correspond to DSHIELD format. It is useful
in developing format converters.  Email headers, etc. are
ignored. If all lines are valid, there is no output ("no news
is good news"). Otherwise, the offending line is output with
a tag explaining the problem.

"The one true awk" is available in source code by following the
link on Brian Kernighan's web page,
http://www.cs.bell-labs.com/who/bwk/index.html.

FSF's gawk should also work.

If you like looking at line noise [1/2 :-)], you can try a2p
(but don't blame me if the resulting perl script doesn't work).

======== dshield_vrfy.awk =====================================
# awk script to verify DSHIELD format

BEGIN  { FS = "[ ]*\t[ ]*";
}

# ignore email headers, blank lines, lines beginning with whitespace (e.g. header continuation lines)
/^[-A-Za-z]+:/ { next; }
/^$/ { next; }
/^[ \t]/ { next; }

NF < 8  { printf "NG (too few (%d) fields): %s\n", NF, $0; next; }

NF > 9  { printf "NG (too many (%d) fields): %s\n", NF, $0; next; }

$1 !~ /^[1-2][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9][ ]+[0-2][0-9]:[0-5][0-9]:[0-6][0-9][ ]+[-+][0-1][0-9]:[0-5][0-9]$/ {
  printf "NG (bad date/time/zone): %s\n", $0; next;
}

$2 !~ /^[0-9]+$/ { printf "NG (bad author field): %s\n", $0; next; }

$3 !~ /^[0-9]+$/ { printf "NG (bad count field): %s\n", $0; next; }

$4 !~ /^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$/ { printf "NG (bad source IP): %s\n", $0; next; }

$5 !~ /^[0-9]+$/ { printf "NG (bad source port): %s\n", $0; next; }

$6 !~ /^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$/ { printf "NG (bad target IP): %s\n", $0; next; }

$7 !~ /^[0-9]+$/ { printf "NG (bad target port): %s\n", $0; next; }

NF > 7 {
  dtz = $1;
  split(dtz, f, " ");
  split(f[1], d, "-");
# don't forget to change the next line ca. 2099 ...
  if ((d[1] < 2000) || (d[1] > 2100)) {
    printf "NG (bad year (%d)): %s\n", d[1], $0;
    next;
  }
  if ((d[2] < 1) || (d[2] > 12)) {
    printf "NG (bad month (%d)): %s\n", d[2], $0;
    next;
  }
  if ((d[3] < 1) || (d[3] > 31)) {
    printf "NG (bad day (%d)): %s\n", d[3], $0;
    next;
  }
# could get fancy and check days per month (taking into account leap years)...
  split(f[2], d, ":");
  if (d[1] > 23) {
    printf "NG (bad hour (%d)): %s\n", d[1], $0;
    next;
  }
  if (d[2] > 59) {
    printf "NG (bad minutes (%d)): %s\n", d[2], $0;
    next;
  }
  if (d[3] > 60) {  # accommodate leap seconds
    printf "NG (bad seconds (%d)): %s\n", d[3], $0;
    next;
  }
  split(f[3], d, "[-+:]");
  if ((d[2] > 14) || (d[3] > 59)) {
    printf "NG (bad zone (%s)): %s\n", f[3], $0;
    next;
  }
  sip = $4;
  sp = $5;
  dip = $6;
  dp = $7;
  proto = $8;
  split(sip, d, ".");
  for (i=1; i<4; i++) {
    if (d[i] > 255) {
      printf "NG (bad source IP address): %s\n", $0;
      next;
    }
  }
  if (sp > 65535) {
    printf "NG (bad source IP port): %s\n", $0;
    next;
  }
  split(dip, d, ".");
  for (i=1; i<4; i++) {
    if (d[i] > 255) {
      printf "NG (bad target IP address): %s\n", $0;
      next;
    }
  }
  if (dp > 65535) {
    printf "NG (bad target IP port): %s\n", $0;
    next;
  }
  if (proto ~ /^[0-9]+$/) {
    if ((proto < 1) || (proto > 255)) {
      printf "NG (bad protocol number): %s\n", $0;
      next;
    }
  } else if (proto !~ /^[A-Za-z]+[-0-9A-Za-z.+]*[0-9A-Za-z+]$/) {
    printf "NG (bad protocol name): %s\n", $0;
    next;
  }
  if (NF == 9) {
    if ($9 !~ /^[,A-Za-z]$/) {
      printf "NG (bad flags field): %s\n", $0;
      next;
    }
  }
}
===============================================================

3Com makes an OfficeConnect series of firewall boxes (e.g. http://www.3com.com/products/en_US/detail.jsp?tab=features&pathtype=purchase&sku=3C16770-US).
There are at least 3 ways to get logs from the box:
1. cut and paste from the web browser interface screen display
2. via syslog
3. via email

The three formats differ slightly. Here is a sample of each type
corresponding to the same event (each is a single line; if it
looks like is has been wrapped, blame your email software):

web browser cut & paste:
UTC 06/28/2001 05:38:15.160 TCP connection dropped 24.181.56.119, 21077, WAN 192.168.99.254, 6346, LAN   10

syslog (including 4 fields prepended by the syslog daemon):
06-28-2001      01:38:16        Local0.Notice   wall.blilly.com id=firewall sn=00D096BF23C5 time="2001-06-28 05:38:15 UTC" fw=192.168.99.254 pri=5 c=64 m=36 msg="TCP connection dropped" src=24.181.56.119:21077:WAN dst=192.168.99.254:6346:LAN rule=10

email:
UTC 06/28/2001 05:38:15.160 -   TCP connection dropped -        Source:24.181.56.119, 21077, WAN -      Destination:192.168.99.254, 6346, LAN -          -      Rule 10

Note that the firewall's IP address for my network configuration
is in a private use range; that is replaced with the ISP-assigned
public IP address for submission to reports at dshield.org.

Here are three scripts to convert to DSHIELD format.

1. for the browser screen format:
============ 3cscreen.awk =====================================
# awk script to convert 3Com OfficeConnect firewall email logs to DSHIELD format
# use author= on command line to set user id

($1 == "UTC")  {  # timestamp line; filters out email headers, message heading
  src = $7;  # normal location of source field for dropped packet reports
  if ((src ~ /[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+,/) && ($9 == "WAN")) {  # valid source field format, incoming probe
    sport = $8;
    if (sport ~ /[0-9]+,/) {  # valid port number field
      dst = $10;
      if ((dst ~ /[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+,/) && ($12 == "LAN")) {  # valid destination field, target inside
        dport = $11;
        if (dport ~ /[0-9]+,/) {  # valid port number field
          proto = $4;
          if ((proto ~ /^[A-Z]+$/) && ($5 != "spoof")) {  # TCP, UDP, ICMP; filters out "Possible port scan", "IP spoof detected", etc.
            gsub(/,$/, "", src);    # remove trailing comma
            gsub(/,$/, "", sport);    # remove trailing comma
            src = src "\t" sport;    # combine source and port (tab separated)
            gsub(/,$/, "", dst);    # remove trailing comma
            gsub(/,$/, "", dport);    # remove trailing comma
            dst = dst "\t" dport;    # combine destination and port (tab separated)
            date = $2;
            split(date, d, "/");    # separate into m d y
            date = d[3] "-" d[1] "-" d[2];  # reassemble as y-m-d
            time = $3;
            gsub(/\.[0-9]+$/, "", time);  # elide millisecond resolution
            printf "%s %s +00:00\t%d\t1\t%s\t%s\t%s\n", date, time, author, src, dst, proto;
          }
        }
      }
    }
  }
}
===============================================================

2. for the syslog format
============ 3csyslog.awk =====================================
# awk script to convert 3Com OfficeConnect firewall syslog lines to DSHIELD format
# use author= on command line to set user id

BEGIN  {
  offset = 4;  # adjust for fields added by syslogd
}

$(offset+1) == "id=firewall"  {
  src = $(offset+13);
  if (src ~ /src=[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+:[0-9]+:WAN/) {  # valid source + port field, incoming probe
    dst = $(offset+14);
    if (dst ~ /dst=[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+:[0-9]+:LAN/) {  # valid destination + port field, target inside
      proto = $(offset+10);
      gsub(/^msg="/, "", proto);      # strip tag
      if ((proto ~ /^[A-Z]+$/) && ($(offset+11) != "spoof")) {  # TCP, UDP, ICMP; filters out "Web site blocked", "IP spoof detected", etc.
        gsub(/^src=/, "", src);      # strip tag
        gsub(/:WAN$/, "", src);      # remove trailing interface field
        gsub(/:/, "\t", src);      # separate IP and port by tab
        gsub(/^dst=/, "", dst);      # strip tag
        gsub(/:LAN$/, "", dst);      # remove trailing interface field
        gsub(/:/, "\t", dst);      # separate IP and port by tab
        date = $(offset+3);
        gsub(/^time="/, "", date);    # strip tag; date is already in yyyy-mm-dd format
        time = $(offset+4);
        printf "%s %s +00:00\t%d\t1\t%s\t%s\t%s\n", date, time, author, src, dst, proto;
      }
    }
  }
}
===============================================================

3. for the email format
============ 3cemail.awk ======================================
# awk script to convert 3Com OfficeConnect firewall email logs to DSHIELD format
# use author= on command line to set user id

($1 == "UTC")  {  # timestamp line; filters out email headers, message heading
  src = $9;  # normal location of source field for dropped packet reports
  if ((src ~ /Source:[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+,/) && ($11 == "WAN")) {  # valid source field format, incoming probe
    sport = $10;
    if (sport ~ /[0-9]+,/) {  # valid port number field
      dst = $13;
      if ((dst ~ /Destination:[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+,/) && ($15 == "LAN")) {  # valid destination field, target inside
        dport = $14;
        if (dport ~ /[0-9]+,/) {  # valid port number field
          proto = $5;
          if ((proto ~ /^[A-Z]+$/) && ($6 != "spoof")) {  # TCP, UDP, ICMP; filters out "Possible port scan", "IP spoof detected", etc.
            gsub(/^Source:/, "", src);  # strip leading tag
            gsub(/,$/, "", src);    # remove trailing comma
            gsub(/,$/, "", sport);    # remove trailing comma
            src = src "\t" sport;    # combine source and port (tab separated)
            gsub(/^Destination:/, "", dst);  # strip tag
            gsub(/,$/, "", dst);    # remove trailing comma
            gsub(/,$/, "", dport);    # remove trailing comma
            dst = dst "\t" dport;    # combine destination and port (tab separated)
            date = $2;
            split(date, d, "/");    # separate into m d y
            date = d[3] "-" d[1] "-" d[2];  # reassemble as y-m-d
            time = $3;
            gsub(/\.[0-9]+$/, "", time);  # elide millisecond resolution
            printf "%s %s +00:00\t%d\t1\t%s\t%s\t%s\n", date, time, author, src, dst, proto;
          }
        }
      }
    }
  }
}
===============================================================


Best regards,
  Bruce Lilly




More information about the list mailing list