Example post

This is a test post.

Section title here

  • Extract: Documents come in all sorts of file formats (.doc, .pdf, etc.) and have all sorts of data formats in them (text, tables, images, movies). These must be extracted and put into a format that can be processed by the next stages.
  • Transform:
  • Chunking: Text data is broken into smaller chunks – a process inventively named ‘chunking’.
  • Embedding: Converting a chunk into a ‘dense vector’ that represents the meaning of the text.
  • Loading: Adding the embedding and original data to a database.
  • Database: The database is going to provide storage for the embedding and data. Often these are vector databases due to the embedding, but graph databases and traditional databases are also used.
  • Retrieval Embedding: The query is converted to a dense vector using the same embedding model.
  • Retrieval: The stored and query vectors represent meaning, so retrieval is the process of finding the k entries in the database that are ‘closest’ to the query vector. Lots of details here!
  • k results are provided to an LLM which uses them to form an ‘augmented’ response.

Another section title

lorem ipsum Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

2024-05-07