Digital Transformation
PIM for Schweitzer Fachinformationen
A case study on process control
The challenge
  • Import and consolidate metadata on 40 million items from more than 500 sources; data suppliers include publishers, catalog aggregators and wholesalers

  • Between 0.5 million and about 4 million article updates have to be processed daily.

  • Fastest possible processing is essential to ensure that customers never see invalid prices or outdated bibliographical information.

  • Sources provide metadata in different, mostly XML-based formats and format variants

  • Sources are not coverage-free, i.e. the data of an article can be provided by several sources, but in different quality.

  • Quality criteria must be taken into account during consolidation; priority rules must be subject to change at any time.

  • Specialist editors should be able to manually revise metadata of particularly important articles and in this way further optimize them.

  • The manually maintained parts of the metadata for an article (e.g. marketing description) must be retained, even if other details for the article (e.g. prices) continue to be updated automatically.

  • Search and filtering must be lightning fast at all times, no matter what is searched for, no matter how many hits there are

  • The mechanism must be extremely reliable and fail-safe to avoid processing backlogs

Metadaten von

Optimal preparation of 40 million articles

keep up to date and make searchable at lightning speed

The solution
  • Numerous source databases represent a first consolidation level: Each source database receives metadata from one or more sources or from one or more feeds of one source.

  • Comprehensive checks and validations are carried out, for example to ensure that bundle prices are handed over without contradiction.

  • In the second consolidation level, all title data from the source databases is merged into one comprehensive database.

  • This is done rule-based according to a stored configuration to ensure that the best quality partial information on an article is assembled from the appropriate sources.
    A (highly simplified) example: author info preferred from source A or source D, title and description preferred from source B, price information preferred from source C

  • IconParc has developed and seamlessly integrated a complete PIM as a catalogue editorial system so that the metadata for selected (particularly important) articles can be enriched as desired and thus additionally optimized.

  • The results of the previous steps finally flow into a central search index; on the basis of the LUCENE search technology integrated by IconParc, a quick search and filtering of the total article stock of over 40 million titles is possible - typically in less than 200 milliseconds per query.

  • Redundantly designed infrastructure with load balancing: Several servers take care of importing and merging the source data at the same time.

  • Future growth is covered by simply adding additional servers

The effect
  • Up-to-dateness: Even with 4 million article updates during the course of the day, the data is completely available in the updated search index on the next day.

  • The Schweitzer B2B e-Procurement platform consists of several target group-specific FRONTENDS. The lightning-fast search and filter service is available throughout all frontends.

  • Due to the high volume of data, the fully automatic consolidation and indexing process cannot be dispensed with.

  • Despite the high volume of data, the quality level achieved is exemplary

  • Thanks to IconParc cluster technology, availability, best performance and scalability are sustainably guaranteed, even with increasing data volumes.


Digitization enables growth

ohne Qualitätseinbußen!