In [1]:
import pandas as pd
import numpy as np

Reformat the exported file. Descriptions in the exported file are broken up into multiple lines which can't be read into a dataframe. These lines are in HTML tags and begin with <.

In [2]:
with open("products_export.csv") as f:
    curr_line = ''
    for line in f:
        if line[0] == "<":
            curr_line = curr_line + line.strip('\n')
            with open('products_export_reformat.csv', 'a') as the_file:
            curr_line = line.strip('\n')

Read the csv file as a dataframe.

In [3]:

This list is only for wines that are published on our website so we filter down to only those products.

In [4]:

Keep only the set of columns we need to generate the Merchant csv file.

In [5]:
df=df.drop(['Cost per item','Variant Requires Shipping','Variant Fulfillment Service', 'Tags','Published','Option1 Name', 'Option2 Name', 'Option3 Name', 'Vendor','Type','Variant Taxable','Variant Weight Unit','Variant Tax Code','Gift Card','Google Shopping / MPN','Variant Image','Variant Barcode','Google Shopping / Condition','Google Shopping / Custom Product','Google Shopping / Custom Label 0','Variant Compare At Price','Google Shopping / AdWords Grouping','Google Shopping / AdWords Labels','SEO Title','SEO Description','Google Shopping / Age Group','Google Shopping / Google Product Category','Variant Inventory Policy','Variant Inventory Tracker','Google Shopping / Gender','Google Shopping / Custom Label 1','Image Position','Image Alt Text','Google Shopping / Custom Label 2','Google Shopping / Custom Label 3','Google Shopping / Custom Label 4'],axis=1)

All the values are necessary to generate the table. If any column is missing values, drop the product from the list.

In [6]:

Missing weights are sometimes stored as 0 in the system. Drop all the products with weight 0.

In [7]:
df = df[df['Variant Grams'] != 0]

Set up the Google Merchant DataFrame.

In [8]:
google = pd.DataFrame()

Use the Shopify handle as the unique product identifier. If the handle is over 50 characters, use only the first 50 characters.

In [9]:
google['id'] = df['Handle'].apply(lambda x: x[:50] if len(x) > 50 else x)

The Google title is the Shopify Title.

In [10]:
google['title'] = df['Title']

Strip the HTML tags from the descriptions.

In [11]:
from html.parser import HTMLParser

class MLStripper(HTMLParser):
    def __init__(self):
        self.strict = False
        self.convert_charrefs= True
        self.fed = []
    def handle_data(self, d):
    def get_data(self):
        return ''.join(self.fed)

def strip_tags(html):
    s = MLStripper()
    return s.get_data()

google['description']=df['Body (HTML)']
google['description'] = google['description'].apply(lambda x: x.replace("<b>"," "))
google['description'] = google['description'].apply(lambda x: x.replace("\n"," "))
google['description'] = google['description'].apply(lambda x: strip_tags(x))

The brand is stored in the Option2 Value in Shopify.

In [12]:
google['brand'] = df['Option2 Value']

The link to the product is the company url followed by /products/ then the product handle. The #breadcrumb is used to make the link jump directly to the picture of the wine on the page. This avoids issues on small screens where visitors will only see our logos and navigation bar when they land on the site. We would rather them see the wine they were searching for.

In [13]:

We don't sell used wine :)

In [14]:

Set the price, then add 'USD' at the end.

In [15]:
google['price']=df['Variant Price']
google['price']=google['price'].apply(lambda x: str(x)+' USD')

Use the Variant Inventory Qty in Shopify to see whether the product is in stock.

In [16]:
def check_stock(x):
    if int(x) > 0:
        return 'In Stock'
    return 'Out of Stock'

google['availability'] = df['Variant Inventory Qty'].apply(lambda x: check_stock(x))

In Shopify the weight is stored as grams. Convert to pounds for Google.

In [17]:
from math import ceil

def g_to_lb(x):
    x = x * 0.00220462
    rd = ceil(x * 1000.0) / 1000.0
    return str(rd)+" lb"

google['shipping_weight'] = df['Variant Grams']
google['shipping_weight'] = google['shipping_weight'].apply(lambda x: g_to_lb(x))
In [18]:
google['image_link']=df['Image Src']
google['google_product_category'] = 499676
google['product_type'] = 'Wine'
google['identifier_exists'] = 'No'

Set price categories in custom_label_0 which can be used for tracking product data.

In [19]:
def product_type_generator(x):
    x = x[:-4]
    num = float(x)
    if (num < 20):
        return '0-20'
    if (num < 40):
        return '20-40'
    if (num < 65):
        return '40-65'
    if (num < 100):
        return '65-100'
    if (num < 150):
        return '100-150'
    if (num < 300):
        return '150-300'
    return '300+'
google['custom_label_0'] = google.price.apply(lambda x: product_type_generator(x))

Some products have issues such as a low quality photo that cause them to be rejected. Drop the items with known issues.

In [20]:
issues = pd.read_csv('item_issues.csv')
issues.rename(columns = {'Item ID':'id'}, inplace = True)
issues = issues[['id','Source']]
issues = pd.merge(google, issues, how='left', on='id')
issues = issues[issues.Source != 'Wine Upload']
google = issues
google = google.drop(['Source'], axis=1)

Convert this DataFrame to a csv that can be uploaded to the Google Merchant feed. Also, take a peek at the first 5 rows of data.

In [21]:
In [22]:
id title description brand link condition price availability shipping_weight image_link age_group google_product_category product_type identifier_exists custom_label_0
0 massetotoscanaigt2016 Masseto Toscana IGT 2016 Description Name: Masseto Toscana IGT Country:... Merlot New 799.99 USD In Stock 3.001 lb Adult 499676 Wine No 300+
1 lucedellaviteluceigt201515l Luce della Vite 'Luce' IGT 2015 1.5L Description Name: Luce della Vite 'Luce' Tosca... Red Blend New 224.99 USD In Stock 6.001 lb Adult 499676 Wine No 150-300
2 lucedellaviteluceigt2015 Luce della Vite 'Luce' IGT 2015 Description Name: Luce della Vite 'Luce' Tosca... Red Blend New 112.99 USD In Stock 3.001 lb Adult 499676 Wine No 100-150
3 ornellaialeserrenuovedellornellaiabolgheriross... Ornellaia 'Le Serre Nuove dell'Ornellaia' Bolg... Description Name: Ornellaia 'Le Serre Nuove de... Red Blend New 72.99 USD In Stock 3.001 lb Adult 499676 Wine No 65-100
4 vegasiciliaalionriberadelduero2015 Vega Sicilia Alion Ribera del Duero 2015 Description Name: Vega Sicilia Alion Country: ... Tempranillo New 110.99 USD In Stock 3.001 lb Adult 499676 Wine No 100-150