“You can have data without information, but you cannot have information without data.”
~ Daniel K. Moran
Good data is central to building useful models. Because our group focuses on building computational models of real-world systems, we depend on having data. Here we share datasets that we have found, collected, or generated.
Social Data
N-gram Lists
N-gram lists are available from both Google and Microsoft.
Social Network of Marketing Scholars
This source provides the collaboration network for marketing scholars, spanning the years 1973-2009.
Stanford Large Network Dataset Collection
The Stanford large network dataset collection is an excellent source of network data generated by a variety of social phenomena including friendship networks, blogs, and road systems.
Network of Notre Dame University Websites
Often referred to as the WWW dataset in publications, this data has been used in a variety of publications since it was first introduced in 1999 in the seminal paper entitled Diameter of the World Wide Web by Réka Albert, Hawoong Jeong, and Albert-László Barabási.
Political Data
FEC Election Contributions
We have been recently working on the election contribution dataset published by the Federal Election Committee. The raw text files published by the FEC have been converted into an MySQL database. This dataset will be available soon.
Supreme Court Citation Network
The Supreme Court Citation Network dataset, assembled by James Fowler and Sangick Jeon provides a comprehensive and quantitative picture of citation patterns among Supreme Court documents.
Biological Data
EGFR Network Connectivity
The EGFR network used in our paper entitled “The Signaling Petri Net-based Simulator” was carefully curated by Prof. Prahlad Ram’s group. The file is currently available in the .EWE format.