ACM Transactions on Internet Technology (TOIT), Volume 6 Issue 2, May 2006

A stochastic model for the evolution of the Web allowing link deletion
Trevor Fenner, Mark Levene, George Loizou
Pages: 117-130
DOI: 10.1145/1149121.1149122
Recently several authors have proposed stochastic evolutionary models for the growth of the Web graph and other networks that give rise to power-law distributions. These models are based on the notion of preferential attachment, leading to the...

Core algorithms in the CLEVER system
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins
Pages: 131-152
DOI: 10.1145/1149121.1149123
This article describes the CLEVER search system developed at the IBM Almaden Research Center. We present a detailed and unified exposition of the various algorithmic components that make up the system, and then present results from two user studies....

Stanford WebBase components and applications
Junghoo Cho, Hector Garcia-Molina, Taher Haveliwala, Wang Lam, Andreas Paepcke, Sriram Raghavan, Gary Wesley
Pages: 153-186
DOI: 10.1145/1149121.1149124
We describe the design and performance of WebBase, a tool for Web research. The system includes a highly customizable crawler, a repository for collected Web pages, an indexer for both text and link-related page features, and a high-speed content...

Behavior-based modeling and its application to Email analysis
Salvatore J. Stolfo, Shlomo Hershkop, Chia-Wei Hu, Wei-Jen Li, Olivier Nimeskern, Ke Wang
Pages: 187-221
DOI: 10.1145/1149121.1149125
The Email Mining Toolkit (EMT) is a data mining system that computes behavior profiles or models of user email accounts. These models may be used for a multitude of tasks including forensic analyses and detection tasks of value to law...