Marketechnics Report: What Do You Mean I Have Dirty Data?
Commentary by Bill Bittner
One of the big themes at Marketechnics this year was how to clean up your data.
Software vendors are providing various tools to synchronize retailer data, both externally with the manufacturer and internally among their own disparate systems. It is my contention
that companies do not have dirty data, but rather misunderstood data.
In the world of retailing, probably the most misunderstood data entity is the “item.” It is often used generically, yet in different circumstances it acquires very specific meaning.
Retail computer applications must distinguish between all the various nuances of item and understand their implication.
In fashion retail, we talk about men’s jeans as an item, but must realize they come in a variety of styles, colors and sizes. If we are using the term item to really mean SKU,
we must know these attributes and the brand in order to define an SKU.
In general merchandise, the concept of “kitting” recognizes that certain combinations of items can be created as another item to encourage purchase of all the things in the kit.
Computer applications must understand the effect on inventory when a kit is constructed at store level or broken apart into its components. The kit requires its own UPC and must
be considered a special type of item until the overwrap is removed.
Probably the most fundamental error a technology provider can make is to equate UPC or GTIN with item. In the supermarket, there are a variety of physical units with different
UPCs that can represent the same item. Manufacturer sponsored promotions in the form of pre-priced, cents-off or bonus packs of an item are all the same item from a replenishment
perspective, but are not the same when considering forecasting and pricing. Various packaging configurations, such as multi-packs or cases, are also the same item, just instances
where it is packaged differently.
Downsizing is a FMCG practice of preserving a price point by reducing the net contents of a retail unit. Thus, the pound can of coffee is now down to 13 ounces but it is still
the same item and, although it shouldn’t, has often kept the same UPC.
Greeting cards and plant seeds represent yet a different type of item. Most retailers do not keep inventory on each individual greeting card and type of plant seed. They have
a price point item that represents the selling price of a particular brand and type of greeting card. All the “$2.99 Cards” are presented by one retailer item number. While the
manufacturer will monitor individual UPCs to determine how well the specific UPC sells, the retailer will only track the generic price point. As cards go in and out of season,
the price point item that defines them can literally have 1000’s of UPCs associated with it. Markdowns on individual UPCs for price point items are a challenge. Some retailers
may assign separate item numbers to holiday card price points separately so they can be discounted when the holiday passes.
Variable weight products create another set of nuances as the UPC does not represent an item at all but defines a category and contains the extended retail price. Variable weight
UPCs are not GTINs because their definition is not global. Industry groups issue recommendations for use of UPC ranges, but individual retailers also make their own assignments.
The same item selling in different stores can have different UPCs. Attempts to reference these UPCs as the same item will fail.
Finally, many organizations seem to still confuse selling units and logistics units. Selling units are “anything that is merchandised separately.” If I am going to merchandise
the same item in cases and multi-packs and “eaches,” then my replenishment applications must regard each of them as separate items and replenish them appropriately. On the other
hand, if I am only going to merchandise an item in eaches, then my applications must replenish it using the logistics unit that makes the most sense for the presentation stock
I have in the store. Suburban stores with large shelf space may replenish in full cases while city stores with small shelf space might replenish in smaller case packs or even
use break-pack options to minimize handling. Demand based on retail units must be linked to logistics units based on what is carried in the warehouse and what is merchandised
in the store and this conversion must be as flexible as possible so that substitutions can be used to minimize out of stocks.
Item is just one area where I believe the simple declaration of dirty data has more subtle implications. My experience with many retail packages has been that their simplified
view of the business has made them difficult to use and resulted in artificial constraints on the business. Instead of blaming bad data, I feel applications must better understand
the relationship between various retail entities, and be able to use and preserve those associations within the context of their function.
Moderator’s Comment: Are there other nuances to the term item? Have you had situations where you believe the term “dirty data” was really “misunderstood
data”? Is there a way to avoid these situations?
I believe the subtlety of the item definition is just one example of how we must improve the ability of computer applications to make inferences from the
real world around them. We must also understand the association between “business partners,” as the same individual may be a supplier, employee, customer, and someone’s cousin.
The association between weather and events in the store is important for predicting “snow days.” (Is that spike in sales really bad data?).
Understanding of these relationships is critical and can be leveraged across many store locations to improve results. It is worth the extra effort that
may be necessary to make associations between UPCs or other entity identifiers so that the relationship can be taken into consideration by the application designers. –
Bill Bittner – Moderator