___ ______ _____ _ _ _____ _ _ _____ _____ _____ ___ ___ ___
/ _ \ | ___ \/ __ \| | | |_ _| | | | ___|_ _| ___|/ _ \ | \/ |
/ /_\ \| |_/ /| / \/| |_| | | | | | | | |__ | | | |__ / /_\ \| . . |
| _ || / | | | _ | | | | | | | __| | | | __|| _ || |\/| |
| | | || |\ \ | \__/\| | | |_| |_\ \_/ / |___ | | | |___| | | || | | |
\_| |_/\_| \_| \____/\_| |_/\___/ \___/\____/ \_/ \____/\_| |_/\_| |_/
we are going to rescue your shit
P R E S E N T S
THE ARCHIVE TEAM ANNIVERSARY GEOCITIES TORRENT VERSION 1.0
"Your webpage isn't classy without a MIDI soundtrack background"
"Seriously, what the shit, Yahoo!?"
HERE IS THE IMPORTANT MESSAGE WHICH YOU SHOULD READ BEFORE DOING TOO MUCH
This is a collection of Geocities data downloaded by a bunch of people who
call themselves ARCHIVE TEAM, who began scraping the Yahoo! Geocities site
during a six month period in 2009, before Yahoo! shut down geocities.com
on October 26th, 2009. This collection is compressed in a UNIX filesystem
with both 7zip archives and tape archives (gtar). If you're a bit of a
data tourist and just want to waft in the scent of a web era gone by, please
go to one of the Geocities mirrors that were put up in the wake of the end
of Geocities. As of this writing, these mirrors include:
You'll get your fix and you won't go into internet rage when you find you
downloaded hundreds of gigabytes of THING YOU DO NOT WANT.
This collection was put together by nearly 100 folks assembling at the news
of the death of Geocities, a website that allowed free hosting of web pages
from roughly 1994 (in beta) to 2009. In 1999, it was purchased by Yahoo!
for three billion dollars. We're not kidding here: billion with a b.
At the time of the purchase, Geocities was the THIRD most popular website on
the Internet. Even by the time of its shutdown, it was in the top 250. We
don't have complete rock-solid knowledge of why it was shut down, but all
signs point to Yahoo! trying to get back to basics (like, uh, having a huge
audience?) and Geocities magically didn't fall into this new "focus", and
lacked any internal cheerleader to make it last through meetings.
Yahoo! succeeded in destroying the most amount of history in the shortest
amount of time, certainly on purpose, in known memory. Millions of files,
user accounts, all gone.
We are unsure how much of Geocities was rescued in this package you have,
but we do know we got enough for it to represent a good amount. Attempts to
contact Yahoo! to get any hard numbers were consistently rebuffed; we
suspect even Yahoo! didn't know exactly how many accounts and files they
had. As mentioned in the IMPORTANT MESSAGE, others were concurrently
downloading Geocities and used alternate methods of discovery, so our datasets
do not overlap 100%. The hope is that more will contribute datasets over time
and a good amount of Geocities will be available for study.
SO WHO IN THE GOOD GODDAMN WOULD WANT ALL OF THESE FILES
While we don't feel the need to act like a 1950s commercial inventing new ways
to use hula hoops and baking powder, the most likely candidates for this
Geocities Anniversary Collection are researchers, scientists, historians and
developers who wish to work with a large collection of information hand-made
by millions of free labor. We forsee application tests, sociology studies,
academic articles and history tests putting this to good use.
Our job is not to find a use for it. Our job was to save it. Now we're giving
it to whoever wants it.
If you go "but what about...." when you think about the repercussions of having
this data set, please save us all a lot of trouble and just delete it off your
hard drive and go watch some tv and don't talk of it again.
THE VERY BORING BUT PROBABLY RATHER IMPORTANT TECHNICAL NOTES FOR YOU
Inside this torrent collection are the following directories:
MEDIA is just a quick set of press releases from Yahoo! and an mp3 interview
about Archive Team and the importance of saving this digital history.
The rest are collections of .7z files. 7z is an archive format called 7ZIP.
To unpack these archives, use 7zip to create... well, a bunch of large files.
These large files are GNU Tar archives, which will then recreate a collection
of directories related to Geocities. And then it gets weird.
As a scraper (wget) was used to get these many files, and the resulting set of
data was very huge, these collections of archives were then sorted down by
some rough headings. So UPPERCASE are Yahoo! IDs on geocities (something like
http://www.geocities.com/DigitalHolocaust) that started with an uppercase
letter. LOWERCASE are lowercase, like http://www.geocities.com/deletegeocities.
NUMBERS began with numbers, like http://www.geocities.com/69convent.
WORKSHOP is our own junkbins of lists, scripts, and other tools used for getting
Geocities and the URL sets we combined together with lots of google and other
searches to find some seeds to grab items. Almost nobody wants this, trust us,
we're just providing you what we generated along the way.
As you run scrapers, they sometimes span hosts and come out with a bunch of
other sites. This is what's in SUBSITES.
Finally, GEOCITIES is the www.geocities.com site, with TONS of links over to
a /geocities/YAHOOIDS directory structure that UPPERCASE, LOWERCASE, and
Make sense? Well, you'll figure it out.
WE ARE GOING TO RESCUE YOUR SHIT
Dropped on the world on October 29, 2010