Archive for the ‘Web Crawlers’ Category
Basic web mapper
Sometimes it is useful to have an automated tool to get the full web map of your site. Perhaps not your own web site, since you have already implemented some kind of automatic generation and notification to Google (have not yet?), but a client’s one.
There are a few tools to map an external web site, I tried some in my particular case. They were just adware, or demos, or they obscured the links in the final report… Yeah, of course, sometimes a $30 license is worth it, but you might not want to acquire a new piece of proprietary software every time you need a new feature, might you?
So I decided to write it myself in PHP, not for the money, but for the fun
Read the rest of this entry »
Working with Web Robots (Crawlers)
Some useful information to start with when attempting to work with web spiders. Just to learn the basis, these links could be useful for those to begin dealing with this subject:
- http://www.robotstxt.org/ > Basic information and links to known robots and open source projects
- http://www.robotstxt.org/wc/faq.html > The FAQ
- http://en.wikipedia.org/wiki/Web_crawler > Some additional information and links to open source bots
- http://www.ficstar.com/web_grabber/web_crawler.html > A private crawler software
- http://www.newprosoft.com/web-spider.htm > Another private crawler software with a competitive price



