russ-love.com
progressive


Aggregating RSS feeds — Part 2

If you have been reading my blog for a while you must already know that I am crazy about deals aggregators. It has been a while since I have done my last roundup and now seems the right time to follow up on it.

BodditBoddit has made some noise at Lifehacker and Wired Blog last week and rightly so. The web site features somewhat innovative functionality (among aggregators that is) but let's start from the beginning. There are really two faces of Boddit: the deals aggregator and the web search wrapper.

Boddit the deals aggregator parses RSS feed (and possibly scrapes the HTML content as well) from deals/coupons web sites and forums to retrieve anything resembling an item for sale with a price tag (or without it). If one is found it stores it in its database and does some post processing (like figuring out what category it belongs to and fetching appropriate thumbnail image to display by the deal description).

The end result is amazing. The categories (when they work) give a sense of browsing a catalog full of merchandise for sale with prices and pictures -- exactly what you would expect to see when you shop. It still lacks the structure and details of a full blown managed deals site but this is probably as far as you can get with an aggregator provided you don't do any manual intervention in data processing.

Boddit the web search wrapper allows you to search a bunch of price comparison and deals sites from one place. It does it by splitting your web browser into two frames - very similar to how retrevo does it. You use the left frame to formulate your search request and you get the results in the right frame. The difference with retrevo is that Boddit actually knows the search request semantics for each of the web sites it lists which greatly improves the end result.

Now the cons -- and there are a few.

  • I was getting quite a few PHP errors when browsing categories. I understand this is a work in progress yet some more testing wouldn't hurt.
  • There is a lot of duplicate content that points to entries at different web sites for the same product sold at the same place for the same price. These need to be detected and combined into one entry so they don't take up space.
  • What is Popularity Score and why is it always 12?
  • The thumbnail images don't always match the product
  • Negative prices are not processed correctly - this is probably just a bug
  • I noticed Boddit developers are playing with allowing users rank the deals up/down. This will not work -- there are just way too many deals coming in and out and too few people to rank them. Besides some of the source sites already do the ranking, why not build on it?


DealightedDealighted has not had a privilege of a free ride on popular blogs and had to buy its way out with paid reviews and a Plazma TV give away (no hurry, the TV is already gone ;-) ). Here is one of such reviews (and I think the best out there) that gives a normal shopper perspective. Since this is not your normal shopper blog (but rather one for pro-shoppers :-) ) I will try to put my own spin on it.

Pros:

  • It is useful to be able to filter out source web sites by unchecking those I don't like (I wish these settings were saved with my login session)
  • Saving deals and leaving comments (are those public?) is also nice
  • Search cloud looks cool but seems rather useless if I came to find something I know

The Pros list stops there. Now goes the long Cons list:

  • The use of real estate is not efficient. People don't really care who posted the deal to the forum or which forum it is for that matter. This info could be hidden from the main list.
  • There is a concept of hot thread but such threads cannot be easily spotted. Why not let sort deals by "hottness" or make the indication more visual? Actually you have attempted to do so in "All: Forums Style" with the "Rating" column...
  • The search functionality plain sucks. My search request for "Sony Camcorder" returned 18 results, some were Sony, some camcorders of brands other than Sony, and only 2 were Sony Camcorders listed close to the end of the list. After I looked closely I realised that the results are really sorted by time and not relevantness. I have got a very similar result with "iPOD Nano" and a few other requests.
  • There is no deal preview of any sort. A few sentences from the deal description is all that is needed. Go to Boddit and check out how they have done it. ;-)

Summary

It turned out to be a long post. I hope you forgive me the technicalities I dived into. Having built an aggregator myself I simply cannot help pouring out my opinions on the work of others.

It sounds like we have a new leader in the market of deals aggregators. Boddit beats the hell out of the rest of the pack. The two other web sites that come close functionality wise are Roosster -- thanks to the huge number of sources they process, and WiredDeals (that is my baby ;-) ) -- thanks to the unique deal ranking calculation and hot deal alerts.


See also:


6 Responses to “Aggregating RSS feeds — Part 2”


  1. 1 Luke Dec 7th, 2006 at 12:22 am

    Hi Yan,

    Thanks for the post.

    A lot of the errors came from the reddit-style voting, which brought a huge overhaul to the core programming. It should be weeded out over time.

    There are cool plans for more innovative features to come.

  2. 2 L Dec 27th, 2006 at 6:12 pm

    What is the legality of crawling rss feeds? Is this legal? I know craigslist is against crawlers.

  3. 3 jp Jan 4th, 2007 at 12:18 pm

    awesome report! Dealslut is another rss deal crawler that looks pretty clean. Keep up the great work.

  4. 4 digitalkaos Feb 13th, 2007 at 8:54 pm

    I don’t see how crawling RSS feeds can be illegal; after all; sites are making these available. If they didn’t want people to read/aggregate them, they wouldn’t make them available.

    DK

  5. 5 David Feb 28th, 2007 at 4:43 pm

    Good post! Are there any sites that reads html and convert them into RSS feed? I mean tools?

  6. 6 Yan Apr 11th, 2007 at 2:07 pm

    Dapper.net is one I used in the past but had not had much success with. Here is a very good post on the subject in general.
    http://www.readwriteweb.com/ar.....rvices.php

Leave a Reply