Server for Generative AI Computations

Turbocharge for content recognition of archival documents

The problem

All archival funds of Ukraine are scanned and stored in image, audio, and video file formats, at best, with a brief content description (annotation). However, the most valuable thing is the content

To enable fast and high-quality analysis and search of information in funds by content - these images, audio, and video need to be converted into a digital format understandable by modern applications built on vector databases, which allow large language models (LLM) to work freely with this information.

Project Goal

Create a powerful server for generative AI computations, which will become a permanent tool for automatic content recognition from images, audio, video, and processing of archival documents of state archives of Ukraine using large language models (LLM).

Main server tasks

Automatic recognition

Continuous process of automatic content recognition from images, audio, video of existing digital funds and future digitized funds.

Computing power

Providing computing power for inference computations based on specialized artificial intelligence models.

Why is this important?

Heritage preservation

Transforming archives into a convenient digital format for integration with modern interfaces

Accessibility

Simplifying document search and analysis for researchers, journalists, and the public

Innovation

Using LLM opens new possibilities for analysis, translation, and classification

Current Progress

Scanned

150 million sheets

Scanning Dynamics

30+ million sheets/year

Remaining to Scan

approximately 800 million sheets

Prepared for LLM Work

0% of 150 million sheets

Estimated scanning completion: 26+ yearsMain problem: document content remains inaccessible for search and analysis

Technical Details

Through testing, it was found that the simplest server with one NVIDIA L4 GPU card (approximately 2 thousand euros each) under ideal conditions can recognize 149,300 images per year... one such card can help with the most critically important documents for research, but for the existing 150 million, it would take approximately 1 thousand years to process.To reach 30 million sheets per year, we need: 30,000,000 ÷ 149,300 = 201 GPU cards. This means an investment of approximately 400,000 euros just for GPUs, not including server hardware, electricity, and maintenance.

Timeline

Information gathering and planning

Completed

Partnership agreements

In progress

Software development and testing

12 months

Server organization

6-12 months

Support and improvement

24 months

Expected results

Modern infrastructurefor digital transformation of archives

Processing accelerationsearch and analysis of archival materials

Heritage opennessaccessibility of Ukrainian historical heritage

Action is needed now

can fully or partially finance the project

can provide equipment

can optimize and/or reduce costs

can spread the word about this in your media

Натисніть для відправки листа