The challenge

  • Import and consolidate metadata on 40 million items from more than 500 sources; data suppliers include publishers, catalog aggregators and wholesalers

  • Between 0.5 million and about 4 million article updates have to be processed daily.

  • Fastest possible processing is essential to ensure that customers never see invalid prices or outdated bibliographical information.

  • Sources provide metadata in different, mostly XML-based formats and format variants

  • Sources are not coverage-free, i.e. the data of an article can be provided by several sources, but in different quality.

  • Quality criteria must be taken into account during consolidation; priority rules must be subject to change at any time.

  • Specialist editors should be able to manually revise metadata of particularly important articles and in this way further optimize them.

  • The manually maintained parts of the metadata for an article (e.g. marketing description) must be retained, even if other details for the article (e.g. prices) continue to be updated automatically.

  • Search and filtering must be lightning fast at all times, no matter what is searched for, no matter how many hits there are

  • The mechanism must be extremely reliable and fail-safe to avoid processing backlogs

Metadaten von

Optimal preparation of 40 million articles

keep up to date and make searchable at lightning speed

The solution

  • Numerous source databases represent a first consolidation level: Each source database receives metadata from one or more sources or from one or more feeds of one source.

  • Comprehensive checks and validations are carried out, for example to ensure that bundle prices are handed over without contradiction.

  • In the second consolidation level, all title data from the source databases is merged into one comprehensive database.

  • This is done rule-based according to a stored configuration to ensure that the best quality partial information on an article is assembled from the appropriate sources.
    A (highly simplified) example: author info preferred from source A or source D, title and description preferred from source B, price information preferred from source C

  • IconParc has developed and seamlessly integrated a complete PIM as a catalogue editorial system so that the metadata for selected (particularly important) articles can be enriched as desired and thus additionally optimized.

  • The results of the previous steps finally flow into a central search index; on the basis of the LUCENE search technology integrated by IconParc, a quick search and filtering of the total article stock of over 40 million titles is possible - typically in less than 200 milliseconds per query.

  • Redundantly designed infrastructure with load balancing: Several servers take care of importing and merging the source data at the same time.

  • Future growth is covered by simply adding additional servers

The effect

  • Up-to-dateness: Even with 4 million article updates during the course of the day, the data is completely available in the updated search index on the next day.

  • The Schweitzer B2B e-commerce platform consists of several target group-specific FRONTENDS. The lightning-fast search and filter service is available throughout all frontends.

  • Due to the high volume of data, the fully automatic consolidation and indexing process cannot be dispensed with.

  • Despite the high volume of data, the quality level achieved is exemplary

  • Thanks to IconParc cluster technology, availability, best performance and scalability are sustainably guaranteed, even with increasing data volumes.

Konsequente

Digitization enables growth

ohne Qualitätseinbußen!

Our References
Current brochures
The Elevator Pitch:
IconParc explained in two and a half minutes


- or -

12 REASONS
Design and build your digital solution
by IconParc
What ever your digitalization project is

Talk to us.
[text.sitemap_layer in ml_comp 'content' undefined]