[Dshield] Duplication of Data

Johannes B. Ullrich euclidian at euclidian.com
Mon Jul 9 13:20:33 GMT 2001

The way it works at this point is that only DShield.org and MNW are
acception submissions from end users. Both submit data to Incidents.org.
The data to Incidents.org is consolidated (no 'target IPs' or 'Author'
data is sent from DShield to Incident.org). 

We did a quick comparison between DShield and MNW data last week and found
a good overlap in attack sources. I don't think a lot of (anyone?) people
report to both.

Duplicate submissions by individual authors to DShield.org itself have
been a problem in the past and I am looking at better ways to filter them

The # of reporting agents/authors is not a measningful as the number of
'targets'. I have a few submitters submitting for entire class B networks,
which would count as only one 'author'. Overall, it is hard to find a good
metric at this point. Here a bit background on 'what we call things':

- 'report': individual line of firewall log. 
- 'author': registered user of dshield. All anonymous submissions are
  registered as the same 'anonymous' author.
- 'target': IP address targeted. For dynamic IP addresses, or authors
  submitting for more than one system, there is more than one
- 'source': attacking IP address. I kind of see that the average home 
  user reports about 2-3 different attack sources per day. But there is
  considerable overlap as some sources scan large netblocks.
  (home user: users submitting data for one system).

A related question back to everyone: Right now, I don't collect any
'personal' information about users. Only for large submitters I try to
make contact and find out some details about the network they submit for.
Would people be willing to share some information about there system?
Here some sample questions:

(please don't answer as a reply to this public list... just send me a
quick vote if I should setup a web form collecting this kind of data
.. to jullrich at dshield.org )

Question about the system submitting data to Dshield:

- is it a firewall, server, personal PC, corporate PC?
- What Operating System?
- how many hours/day is the system online?
- how many systems do you submit data for?
- what firewall software are you using? (this is something I know already
for people using formats other than the DShield format).

any other data that people think may be helpful?

--- Johannes Ullrich 
                                  Join http://www.dshield.org 
    jullrich at sans.org ---

On Mon, 9 Jul 2001, David Kennedy CISSP wrote:

> Do DShield, Mynewatchman and Incidents cooperate to reduce the amount
> of probe/intrusion reporting that's duplicated among them?  I see
> from DS' home page some 24M lines of data, but no clear indication of
> # of reporting agents.  MNW reports 600-odd agents.  It's unclear
> where Incidents gets data other than from DS and MNW.  With only 600
> agents, it would not take a great deal of duplication to skew the
> reports.
> For example, #5 on the DS top 10 list now is 80/TCP with ~5K
> reports/day.  But #4 is FTP which also has had days with only ~5K of
> reports.  If 1/3 of MNW's reports are also duplicated on DS, it could
> skew the results compiled by Incidents.  If the numbers are still
> small now, perhaps now is the best time to address this before the
> numbers get too large to scale a fix?
> Suggestion:  Just ask reporters not to duplicate their submissions;
> put a note on the DS registration and client download pages asking
> that the data only be reported to DS.  If you want to get
> sophisticated have the clients look in default locations for each
> other.  If another client is found either return an error to the user
> or include the error in the log submission.
> -- 
> Regards,
> David Kennedy CISSP
> Director of Research Services, TruSecure Corp. http://www.trusecure.com
> Protect what you connect.
> Look both ways before crossing the Net.
> _______________________________________________
> Dshield mailing list
> Dshield at dshield.org
> To change your subscription options (or unsubscribe), see: http://www1.dshield.org/mailman/listinfo/dshield
> ------------ Output from pgp ------------
> Pretty Good Privacy(tm) Version 6.5.1
> (c) 1999 Network Associates Inc.
> Uses the RSAREF(tm) Toolkit, which is copyright RSA Data Security, Inc.
> Export of this software may be restricted by the U.S. government.
> File is signed.  Good signature from user "David Kennedy <david.kennedy at acm.org>".
> Signature made 2001/07/09 08:43 GMT
> WARNING:  Because this public key is not certified with a trusted
> signature, it is not known with high confidence that this public key
> actually belongs to: "David Kennedy <david.kennedy at acm.org>".

More information about the list mailing list