Skip to content

Scraping Amazon Items Using Django Mash-Up

by Paul Kenjora on May 31st, 2007

In this article learn how to mash up your Django powered site with Amazon affiliate items in one simple form. See it in action at: AwareLabs. Then skip to the code section and integrate it into your site.

Background

You want to list links to Amazon items on your Django powered website. However, you find that the Amazon mechanism is ineffective at bringing you the affiliate sales you are craving. You tried the Amazon Associates tools and they give you:

  • Product Links
    + Links to exactly what you want.
    - Create ONLY one link at a time.
  • Context Links (Beta) + Creates several links.
    - Intrusive and annoying to users.
  • Omakase Links + Creates several links.
    - Not very good at figuring out what users like.
  • Recommended Product Links
    + Easy to integrate with tags.
    - Not very good at finding relevant products.
  • Banner Links
    + Eye catching.
    - Takes up much page.
  • Text Links
    + Simple and quick.
    - Low return rate.
  • Search Box Links
    + Very easy to integrate.
    - Low return rate.

Unfortunately none of the above is exactly what you need. What you need is a quick and easy way to add several Amazon items to a group of tags and affiliate it with your Amazon Associate account.

The solution is a hybrid between automation and tailoring. Working under two assumptions:

  1. You know your tags better than Amazon.
  2. Amazon knows which items sell better relative to an already purchased item.

With the above ideas in mind you can quickly figure out an item that would go well with a tag and then quickly grab all related items and insert them into your site as well. The following code is written specifically in Django to allow easy integration with your existing tags. See: Django Generic XSS Safe Tags.

Code

The following code is a Django form which processes a url containing Amazon items and an Amazon Associate account number to create a list of Amazon items you can embedd in your site. The code is not tied to any model but can easily be modified to automatically insert the items into a Django model of your choosing.

amazon_import.py


import urllib2
import re

from string import Template

from django import forms
from django.core import validators
from django.core.exceptions import ObjectDoesNotExist
from django.shortcuts import render_to_response
from django.contrib.contenttypes.models import ContentType

class AmazonManipulator(forms.Manipulator):
def __init__(self, request):
self.fields = (
forms.URLField(field_name='url', length=100, maxlength=100, is_required=True, validator_list=[validators.isValidURL]),
forms.TextField(field_name='id', length=20, maxlength=100, is_required=False, validator_list=[validators.hasNoProfanities]),
)
self.request = request
self.items = []
self.id = ''

def post(self):
new_data = {}
errors = {}

if self.request.POST:
new_data = self.request.POST.copy()
errors = self.get_validation_errors(new_data)
self.do_html2python(new_data)

if not errors:
req = urllib2.Request(new_data['url'])
self.id = new_data['id']
response = urllib2.urlopen(req)
data = response.read()

try:
pat = re.compile(r' for m in re.finditer(pat, data):
self.items.append(m.group(1))

except AttributeError:
return forms.FormWrapper(self, new_data, errors)

self.items = set(self.items)

if len(str(self.id)) == 0:
self.id = 'joolis-20'

# Perfect spot to save items to the database, maybe combine with tags

return forms.FormWrapper(self, new_data, errors)

def run(request):

amazon_manipulator = AmazonManipulator(request)
amazon_form = amazon_manipulator.post();

return render_to_response('amazon_form.html', {'amazon_form':amazon_form, 'items':amazon_manipulator.items, 'id':amazon_manipulator.id })

The template for the above form is defined below:
amazon_form.html



Grab Amazon Items

Amazon has already done the research on items that sell well relative to other items users have bought. As a result the best method for ensuring a sucesfull ad campaign is to list items a users is likely to buy. This utility lets you start with one item and quickly generate a few dozen, copy and paste them into your website. Quick utility for grabbing all items related to an item from Amazon. Created by Aware Labs.

Item url: {% if amazon_form.url.errors %}{{ amazon_form.url.errors|join:", "}}{% endif %}
{{ amazon_form.url }}  Amazon Items

Affiliate id: {% if amazon_form.id.errors %}{{ amazon_form.id.errors|join:", "}}{% endif %}
{{ amazon_form.id }}  Amazon Afilliate Program

{% if items %}

Resulting Amazon Items:

{% for item in items %}{%include 'amazon.html' %}
{% endfor %}

Resulting Amazon Code:


{% endif %}


amazon.html



Example

For example, if you wanted to add Amazon items tagged "007" or "James Bond" to your site. First you would search amazon with "007" or "James Bond". Select the best item from the several hundred that usually crop up. Then place the url in the tool as demonstrated at AwareLabs. Enter your Amazon Associate account number and click "Grab Amazon Items". The Django form will scrape the Amazon page for all related items and present a list of items at the bottom along with the source code.

You could easily modify the code to dump the items into a database based on user input. Maybe tie the tags entered by your users to a search on Amazon. Maybe do the process manually every week just to freshen up your website with new items. Whatever your use I hope this helps bridge the gap between Django powered sites and Amazon.com.

  • deanpowers
    I don't see the Amazon app? Can you give the link again?
  • Dean,

    The app code is in the python files in the article, you need to install in Django.

    I've since moved on to a project called Arkayne. If you are looking to link to Amazon items from your blog please create an account and contact me using the contact form.

    If you think this Amazon affiliate app is neat, you'll love what we've done over at Arkayne.

    http://www.arkayne.com

  • deanpowers
    HI,

    I'm looking at it now. We are developing two major sites that are mashups.
    We want to provide a blogging facility for registered users and at this
    point are writing our own Django Blog App.

    We plan to use Disqus for blog comments and for comments on other content
    types.

    I'll look to see how we might incorporate Arkayne into our architecture.

    Arkayne has a nice site design.

    Dean
blog comments powered by Disqus