Wednesday, August 25, 2010
ClueWeb 09
ClueWeb is a wonderful Web dataset available for the research community. I
- 1,040,809,705 web pages, in 10 languages
- 5 TB, compressed. (25 TB, uncompressed.)
- Unique URLs: 4,780,950,903 (325 GB uncompressed, 105 GB compressed)
- Total Outlinks: 7,944,351,835 (71 GB uncompressed, 24 GB compressed)
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
[フレーム]