top of page

Preparing your documents before scanning

Digitization must first be accompanied by the preparation of documents. Generally, the preparation of documents is associated with the actions of removing bindings, staples, paper clips... However, the preparation of documents can also include a preliminary phase of understanding the contents to ensure the granularity of creation of files in the format electronic. Depending on the content to be dematerialized, it is relevant to create separate files by type, period... which can facilitate research, sharing... conservation & use.

Post-scanning indexing refers to the process of creating indexes or metadata for scanned documents. When physical documents are converted to digital format, they can be stored in formats such as PDF files, digital images, text files...

However, to facilitate the effective search and retrieval of these digitized documents, it is essential to index them. Indexing involves extracting key information from document content and associating it with structured metadata.

Metadata commonly used for indexing includes information such as document title, author, creation date, document type, relevant keywords... These metadata help describe and categorize content of the document, which facilitates its subsequent search. But sometimes this data is not directly present in the documents.

Indexing can be done in different ways. In some cases, it can be done manually by our operators who read the content of the scanned document and add the corresponding metadata. In other cases, our automatic indexing techniques (LAD/RAD) are used, incorporating the use of OCR (Optical Character Recognition) software to extract text from scanned images, followed by natural language processing to analyze and extract additional information.

Once scanned documents are indexed, it becomes easier to search, sort and access them quickly. Document management systems, digital libraries and search engines often use indexes to allow users to quickly find relevant documents based on specific criteria such as keywords, dates, authors... These metadata are not always contained in the documents, human intervention thus makes it possible to enrich them.

25 views0 comments

Recent Posts

See All


bottom of page