Scraping Amazon Items Using Django Mash-Up
In this article learn how to mash up your Django powered site with Amazon affiliate items in one simple form. See it in action at: AwareLabs. Then skip to the code section and integrate it into your site.
Background
You want to list links to Amazon items on your Django powered website. However, you find that the Amazon mechanism is ineffective at bringing you the affiliate sales you are craving. You tried the Amazon Associates tools and they give you:
- Product Links
+ Links to exactly what you want.
- Create ONLY one link at a time. - Context Links (Beta) + Creates several links.- Intrusive and annoying to users.
- Omakase Links + Creates several links.
- Not very good at figuring out what users like. - Recommended Product Links
+ Easy to integrate with tags.
- Not very good at finding relevant products. - Banner Links
+ Eye catching. - Takes up much page. - Text Links
+ Simple and quick.
- Low return rate. - Search Box Links
+ Very easy to integrate.
- Low return rate.
Unfortunately none of the above is exactly what you need. What you need is a quick and easy way to add several Amazon items to a group of tags and affiliate it with your Amazon Associate account.
The solution is a hybrid between automation and tailoring. Working under two assumptions:
- You know your tags better than Amazon.
- Amazon knows which items sell better relative to an already purchased item.
With the above ideas in mind you can quickly figure out an item that would go well with a tag and then quickly grab all related items and insert them into your site as well. The following code is written specifically in Django to allow easy integration with your existing tags. See: Django Generic XSS Safe Tags.
Code
The following code is a Django form which processes a url containing Amazon items and an Amazon Associate account number to create a list of Amazon items you can embedd in your site. The code is not tied to any model but can easily be modified to automatically insert the items into a Django model of your choosing.
import urllib2
import re
from string import Template
from django import forms
from django.core import validators
from django.core.exceptions import ObjectDoesNotExist
from django.shortcuts import render_to_response
from django.contrib.contenttypes.models import ContentType
class AmazonManipulator(forms.Manipulator):
def __init__(self, request):
self.fields = (
forms.URLField(field_name='url', length=100, maxlength=100, is_required=True, validator_list=[validators.isValidURL]),
forms.TextField(field_name='id', length=20, maxlength=100, is_required=False, validator_list=[validators.hasNoProfanities]),
)
self.request = request
self.items = []
self.id = ''
def post(self):
new_data = {}
errors = {}
if self.request.POST:
new_data = self.request.POST.copy()
errors = self.get_validation_errors(new_data)
self.do_html2python(new_data)
if not errors:
req = urllib2.Request(new_data['url'])
self.id = new_data['id']
response = urllib2.urlopen(req)
data = response.read()
try:
pat = re.compile(r'
for m in re.finditer(pat, data):
self.items.append(m.group(1))
except AttributeError:
return forms.FormWrapper(self, new_data, errors)
self.items = set(self.items)
if len(str(self.id)) == 0:
self.id = 'joolis-20'
# Perfect spot to save items to the database, maybe combine with tags
return forms.FormWrapper(self, new_data, errors)
def run(request):
amazon_manipulator = AmazonManipulator(request)
amazon_form = amazon_manipulator.post();
return render_to_response('amazon_form.html', {'amazon_form':amazon_form, 'items':amazon_manipulator.items, 'id':amazon_manipulator.id })
The template for the above form is defined below:
amazon_form.html
Grab Amazon Items
{% if items %}
Resulting Amazon Items:
{% for item in items %}{%include 'amazon.html' %}
{% endfor %}
Resulting Amazon Code:
{% endif %}