从 Internet Archive Wayback Machine 中移除/删除你的网站

Internet Archive 这个网站有一个功能叫 Wayback Machine 它对全世界所有的网页进行了备份保存,任何人都可以通过 https://archive.org/web/ 查找到一个网站的过去,即使他已经删除。如果你想把你的站点从 Wayback Machine 清除,让任何人都无法查看到网站的过去,方法也是很简单的。

步骤一
在你的站点根目录新建一个 robots.txt 文件,然后添加对应的规则,该文件的意义在于告诉搜索引擎不要抓取我的网站。

User-agent: ia_archiver
Disallow: /

如果你不会操作,可以去搜索关键词【robots.txt 生成】然后就可以找到你的答案。

做好第一步只是会告诉 Wayback Machine 不要在抓取我的网站,那么已经抓取的内容还是会保留在 Wayback Machine 中提供查询。这时我们就要进入第二步操作。

步骤二
编写邮件发送到 [email protected] 我从 Google 中搜到了两个邮箱,为了保险起见,给 [email protected] 也发一份。注意:发送邮件推荐使用 Gmail 国产邮箱比如QQ邮箱之前我试过会出现退信的情况。内容参考以下格式,注意替换掉内容中关键信息。

收件人:[email protected],[email protected]
主题:Remove site from Internet Archive
内容:Hi, my name is Eric(替换) owner of kakarot.net(替换). I’m officially requesting immediate removal of the kakarot.net(替换) site/domain from web.archive.org and the Internet Archive Wayback Machine.

We have placed the User-agent: ia_archiver Disallow: / code in our robots.txt file which is not being followed.

发送完成以后你会收到一封来自 [email protected] 的回信。

Hello,

The Internet Archive can exclude websites from the Wayback Machine (web.archive.org), but we first respectfully request that you help us verify that you are the site owner or content author of both domains/URLs by doing any one of the following for each:

(Note: Some of these options can be in reference to the content located in prior Wayback Machine captures, or documentation you may have related to that time period.)

  • post your request on the current version of the site (and send us a link).
  • send your request from the main email contact listed on the site and show us where it can be located (if one is present).
  • send a request from the registrant's email (if publicly viewable on a WHOIS lookup you can link us to) or webmaster’s email listed on the site.
  • point us to where your personal information (name, point of contact, image of self) appears on the site in a way that identifies you as owner of the site or author of the content you wish to have excluded - in this instance, we ask to verify your identity via a scan of a valid photo ID (sensitive information such as birth date, address, or phone number can be redacted).
  • forward to us communication from a hosting company or registrar addressed to you as owner of the domain.

(Note: The simple mention of someone's name/alias/handle/username, and/or a hyperlink/redirect between sites/pages/accounts in itself is typically not sufficient to have archives excluded.)

If none of these options are available to you, please let us know in a reply to this email.

We would be grateful if you would help us preserve as much of the archive as possible. Therefore, please let us know if there are only specific URLs or directories about which you are concerned so that we may leave the rest of the archives available.

As you may know, Internet Archive is a non-profit digital library, seeking to maintain via the Wayback Machine a freely accessible historical record of the Internet. The material in the archives are not exploited by Internet Archive for commercial profit.

---
The Internet Archive Team

大致的意思就是需要验证你的身份,证明你是网站的所有者。邮件中列出了几个方法,这里就不一一去讲,最方便的是用域名 WHOIS 中可以查询到的公开邮箱去发送邮件给他们,我记得好像是去年 WHOIS 中就不允许出现邮箱了,所有该方法不推荐使用。

我用的方法是在根目录新建一个 remove-site.html 的文件。

<!DOCTYPE html>
<html>
<head>
    <title>Remove site from Internet Archive</title>
</head>
<body>
<p>I want remove this site from Wayback Machine.</p>
</body>
</html>

复制上面的源码,保存为.html文件放在网站的根目录。

然后回复邮件,内容如下

https://kakarot.net/remove-site.html
thank you..

注意替换成你的域名。

如果成功的话你会收到一封邮件,内容如下

Hello,

The sites/URLs referenced in your email below have now been submitted for exclusion from the Wayback Machine at http://www.archive.org:

kakarot.net

Please allow up to a day for the automated portions of the process to run their course and for the changes to take effect. If you have any other questions or concerns, please let us know.


The Internet Archive Team

这时再去查询网站历史

https://web.archive.org/web/*/kakarot.net

你会得到一个信息

Sorry.
This URL has been excluded from the Wayback Machine.

出现该提示就代表网站已经被完全移除。

参考:
http://groups.ischool.berkeley.edu/archive/aps/removal-policy
https://blog.imincomelab.com/remove-site-wayback-machine-archive/


那些另类奇怪的小众网站

互联网将全世界连接在了一起供人们进行交流,人类的创意是无限的,互联网给了每个人一个展示自己创意的平台,这里我就收集一些比较另类奇怪的小众网站供大家欣赏学习。

百万美元网站
http://www.milliondollarhomepage.com/
5个月赚到100万美元的奇迹网站,把网页分成100万个格子然后每个格子以1美元出售。

伪装黑客
http://geektyper.com/
伪装成黑客的电脑桌面,像黑客一样的去输入计算机命令。

视错觉
https://michaelbach.de/ot/
通过134个案例,让你感受到什么是视错觉。

网络威胁地图
https://cybermap.kaspersky.com/cn/
卡巴斯基推出的网站,可以实时看到全世界的网络威胁数据。

全球富豪榜
http://www.globalrichlist.com/
输入你的收入,可以测试出你的财富在全球排名(仅供娱乐)。

360度环游世界
http://www.airpano.com/
用360度的视角去看全世界。

全世界最高的网页
https://worlds-highest-website.com/

全世界最小的网站
http://www.guimp.com/

艺术创作
http://weavesilk.com/
用鼠标可以随便创作图案,挺炫的。

老版Windows模拟
http://www.windows93.net/
从开机到使用全程模拟老版Windows系统。


Google 镜像网站使用说明

自从Google退出中国后,百度就开始为所欲为,搜索质量严重下降。作为G粉,特别是Google学术重度使用者,这点无法忍受。于是乎我就搭建了一个Google镜像站点,分别镜像了Google搜索和Google学术。

Googlehttps://g.kakarot.net
Google学术https://gs.kakarot.net

镜像站点完全免费使用,不干扰任何搜索结果。镜像服务器来自美国,受Google搜索结果的多样化排序规则,不同地区搜索结果可能会有所不同。

该镜像仅用作搜索查询,如需登陆账号请自备VPN登陆Google.com网站。

网站支持SSL加密访问,请务必使用 https 确保你的信息安全。

制作镜像站点,需要服务器的支持,如果你觉得该镜像Google对你有所帮助,可以考虑赞助服务器,帮助项目更好的维持下去。

赞助金额不限大小,一分也是爱。

赞助方式,可以直接打开文章下方的赞赏支持。

ps.网站中的横幅可以点击右边X隐藏,注意如果清空cookie会重新出现,感谢理解。