Classify Products from Photos: How Image-Based HTS Classification Works

Every customs broker knows the frustration: a client sends a product description that says "plastic container" or "metal part" and expects an accurate HTS classification. The description is missing every attribute that actually matters — material composition, construction method, dimensions, functional features. Brokers end up emailing back asking for photos anyway. Image-based classification puts the photo first.

The Problem with Text-Only Classification

Product descriptions from importers are notoriously incomplete. This is not a criticism of importers — they know their products by trade names, model numbers, and commercial function. They rarely think in terms of the attributes that drive HTS classification: constituent material by weight, method of construction, principal use in the United States, or whether a textile is knit or woven.

The result is that customs brokers spend a significant portion of their classification time gathering information rather than classifying. A typical workflow looks like this:

Receive product description from importer ("stainless steel kitchen tool")
Email back requesting material breakdown, photos, spec sheets, and intended use
Wait 1-3 days for a response
Receive a partial answer and one blurry photo
Begin actual classification work

This back-and-forth delays entry filing, increases broker labor costs, and introduces risk when brokers classify based on incomplete information to meet filing deadlines.

How Image Classification Works

Image-based classification adds a visual analysis step to the classification pipeline. Upload 1-3 product photos alongside the text description, and GPT-4o vision analyzes the images to extract classification-relevant attributes that the text description may be missing.

The vision model examines each image and extracts:

Material composition: Identifies visible materials — leather vs. synthetic leather, knit vs. woven fabric, stainless steel vs. aluminum, solid wood vs. engineered wood
Construction method: Detects how the product was made — molded, injection-formed, sewn, welded, cast, machined, 3D-printed
Dimensions and scale: Estimates relative size from contextual cues and known reference objects in the image
Distinguishing features: Identifies functional elements like closures, hinges, electrical components, moving parts, and integrated mechanisms
Visible markings: Reads labels, stamps, material composition tags, and country of origin markings when visible
Suggested HTS chapter: Based on the visual analysis, suggests the most likely HTS chapter as a starting point for classification

This visual analysis output is structured as a set of classification-relevant attributes that feed into the broader classification engine — not as a standalone HTS code guess.

What the AI Detects: Practical Examples

Material Identification

Material composition is the single most important classification attribute for a large portion of the HTSUS. The vision model distinguishes between:

Leather vs. synthetic: Grain pattern, edge characteristics, and surface texture differentiate genuine leather (Chapter 42) from plastics with leather-like surfaces (Chapter 39 or 42 depending on construction)
Knit vs. woven textiles: Loop structure vs. interlacing yarns — the distinction that determines whether a garment is classified in Chapter 61 (knitted) or Chapter 62 (not knitted)
Steel vs. aluminum: Surface finish, weight implications, and common product forms help distinguish Chapter 72-73 products from Chapter 76
Solid wood vs. composite: Grain patterns, edge cross-sections, and surface characteristics distinguish Chapter 44 solid wood articles from engineered wood products

Construction Method

How a product is made often determines its classification as much as what it is made of:

Molded plastics: Parting lines, gate marks, and uniform wall thickness indicate injection molding — relevant for plastics classification in Chapter 39
Sewn construction: Stitch patterns, seam types, and assembly methods are critical for textile articles in Chapters 61-63
Welded metal: Weld beads, heat-affected zones, and joint types distinguish fabricated metal articles from cast or machined ones
Cast vs. forged: Surface texture, draft angles, and parting lines differentiate casting from forging — relevant for iron and steel articles in Chapters 72-73

Functional Elements

Functional features often determine whether a product is classified by its material or by its function:

Closures and fasteners: Zippers, snaps, buckles, and clasps affect classification of bags, cases, and containers
Electrical components: Visible wiring, circuit boards, motors, or batteries can shift classification from a material-based chapter to Chapter 85 (electrical machinery)
Moving parts and mechanisms: Gears, bearings, pivots, and actuators may indicate classification under Chapter 84 (mechanical machinery)

When to Use Image Classification

Image classification is not needed for every product. It adds the most value in specific scenarios:

Ambiguous descriptions: When the text description is too vague to classify confidently — "plastic item," "textile accessory," "metal component"
Composite goods: Products made from multiple materials where the essential character determination requires seeing how the materials are combined
Material-driven chapters: Textiles (Chapters 50-63), plastics (Chapter 39), rubber (Chapter 40), leather (Chapter 42), wood (Chapter 44), and metals (Chapters 72-83) — where material composition is the primary classification driver
Visual inspection products: Products where CBP rulings frequently reference physical examination — footwear (Chapter 64), headgear (Chapter 65), and toys (Chapter 95)
Dispute resolution: When a classification decision is being challenged and photographic evidence strengthens the reasonable care documentation

How It Integrates with the Classification Pipeline

Image analysis is not a replacement for text-based classification — it is an enhancement. Here is how it fits into the broader classification workflow:

Image upload: The user uploads 1-3 product photos alongside the text description
Visual analysis: GPT-4o vision processes the images and generates a structured "visual analysis" output containing detected materials, construction methods, and features
Attribute merging: The visual analysis output is merged with the text description as an additional input to the RAG classification pipeline. It functions as a "visual analysis hint" — supplementary evidence that enhances the text-based classification
Classification: The classification engine processes the combined text + visual attributes through the same pipeline: GRI analysis, Section/Chapter Note evaluation, and CBP ruling matching
Output: The final classification includes both the text-based reasoning and the visual evidence that supported the decision

This design means the vision model cannot override a clear text-based classification. If the product description says "100% cotton woven shirt" and the image confirms it, the image adds confidence. If the image reveals something the text missed — say, the shirt has a knit collar — that additional attribute is flagged for the broker's review.

Availability and Pricing

Image-based classification is available on all paid Harmonize plans:

Starter ($49/month): Includes image classification with up to 3 images per classification request
Professional ($149/month): Same image classification capabilities with higher monthly classification volume
Enterprise: Custom volume with priority processing

Supported image formats: JPEG, PNG, and WebP. Maximum file size: 10MB per image. For best results, use well-lit photos showing the product from multiple angles — front, back, and a close-up of material/construction details.

The free tier includes text-based classification only. Image classification is a paid feature because the vision model processing adds significant computational cost per classification.

Try Image Classification

Upload product photos alongside text descriptions for more accurate HTS classification. The vision model identifies materials, construction methods, and functional features that text alone can miss.

Try Harmonize Free

Image classification available on Starter plan ($49/mo) and above • Up to 3 images per classification

This article is for informational purposes only and does not constitute legal or customs brokerage advice. Importers and brokers should consult with a licensed customs broker or trade attorney for guidance on specific classification and compliance decisions. Harmonize.ai is a classification research tool operating under 19 U.S.C. § 1641 — we provide research support, not customs brokerage services.