Cleaning and consolidating product lines by description and attributes: but it’s not easy if there is no data standardisation
Inconsistent inventory data
Classic inventory data issues: SKUs coming in from different merchants and distributors: how do you identify and consolidate the same products? Cab Sauv versus Cabernet; Mclaren Vale versus S Australia or Nuits-St-Georges versus Nuit St Georges?
And the amount of data provided can differ for the same product: “2002 Hunting Moon, Shiraz, Margaret River, Western Australia 750 ml” may be the same as “2002 Hunting Moon”
Data Clean: Accents, abbreviations and punctuation
The global nature of wine means a whole gamut of diacritic variations is there. We clean for this and importantly for the many (many) ways that data providers encode the information (and sometimes inconsistently).
So if we see “ÃƒÂ§” we know what it is.
The New World loves abbreviations, “Cabernet [Sauvignon]”; “Sauvignon” [Blanc]; GSM; S Australia.
Whilst the Old World just assumes you know a lot from the name and practically speaking a Chateauneuf is the original Rhone “GSM”.
Hyphens, apostrophes, non breaking spaces and full stops can dominate this area for wines, and they can make a huge difference: Stag’s Leap versus Stags Leap [Winery]; Latour versus La Tour anyone?
Wine naming has its own rules and conventions, though surprisingly few are absolute. Such rules are useful to identify the important data attributes so we know Chablis means a Chardonnay, and a Chianti Classico will be mostly Sangiovese but not necessarily all Sangiovese.
Our rules engine can also spot inconsistencies so we can identify cases where retailers have “Chapel Down NV Vintage Reserve” and say a non-vintage vintage does not make sense.
So what does our wine data process do?
We take wine product data, either just the product name or adding the available attributes (if they can be trusted). We clean and normalise the data, identifying the things that matter: grape, colour, producer, region, classification whilst isolating SKU specific information such as unit size, retailer or add ons (decanters, boxes and teddy bears). These attributes are then used to define the uniqueness of product lines.
This uniqueness, combined with our experience of data matching, allows us to identify duplicates and consolidate them.
Importantly, the matching process identifies mismatches that matter: ie those that have contradictory data, and mismatches that are solely due to a difference in the depth of data.
Why does this matter?
Whatever the industry, product data requires a consistent definition of uniqueness so you can track what has been sold or used. Who is selling the same thing better, which region is using this more, what products sell well together.
The issue is if such sales or purchase data is coming from different sources, it’s going to be in different formats. And even in retail the barcode or EAN is not always available or provided. That’s where the data nightmare begins, standard desktop applications such as excel are focused on exact matches and can’t handle the subtleties and complexities that scores or hundreds of different rules bring.
An automated, consistent rules based process to clean, extract and standardise product data based on description and attributes.
Scalable and repeatable process: as product universes evolve users can update the tables that tag the product data without having to change the wine data engine itself.
Significant reduction in duplication of product lines. Conversion of data into information: accurate of understanding of products being sold by whom, how, where and when.