|Building the perfect SEO crawler|
|Written by Ian Lurie|
|Monday, 13 February 2012 00:00|
A guy can dream, right?
I've fiddled with crawler technologies for years. A good web spider is an essential tool for any SEO. There's Xenu, and Screaming Frog, and Scrapy, and lots of others. They're all nice. But I have this wish list of features I'd like to see in a perfect SEO crawler.
I'd always told myself that I'd code something up with all these features when I had the spare time. Since I probably won't have any of that until I'm mummified, here's my specification for a perfect SEO crawler:
My crawler can't barf everywhere on sites larger than 100,000 URLs. To make that work, it should be:
Just racing through a site and saying "this page is good, this page is bad" isn't enough. I need to build a real index that stores pages, an inverted index of the site, and response codes/other page data found along the way. That will let me
I also need a crawler that behaves the way the big kids do: Google renders pages. I need my crawler to enable the same behavior. (I know, all you purists will say "The crawler doesn't do that, the indexing mechanism does." This is my dream. I get to mess with semantics a little. Phhbbbt.)
On large sites, this would provide critical insight. Designers are so often obsessed with hiding all that nasty content. That's fine, if we can see where content needs to be revealed. With a rendering tool, we could do that.
Reporting over time
Finally, fold all of this data together, into a tool that shows me rankings, organic search traffic and site attributes, all in once place. Then, I can finally show my clients what changes they made, how well the changes influenced results, and why they should stop rolling their eyes every time I make a suggestion.
Get me that tomorrow, OK?
I know this is a really tall order. Like I said, I've been plinking away at this for years. But I do think it's all possible. Cloud storage and processor time is cheap. Crawling technologies are ubiquitous. So, who's with me?
|Last Updated on Monday, 13 February 2012 13:44|
Home - all the latest on SNC
SEO - our collection of SEO articles
Technical SEO - for the geeks
Latest News - latest news in search
Analytics - measure up and convert
RSS Rack - feeds from around the industry
Search - looking for something specific?
Authors - Author Login
SEO Training - Our sister site
Contact Us - get in touch with SNC