no longer tracking egg-info

This commit is contained in:
Alessandro Clerici
2025-07-09 15:39:31 +00:00
parent 29e6ac2294
commit 2ab91099f2
5 changed files with 0 additions and 276 deletions

View File

@ -1,215 +0,0 @@
Metadata-Version: 2.2
Name: m3docrag
Version: 0.0.1
Summary: Multimodal Document Understanding with RAG
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: accelerate==1.1.0
Requires-Dist: loguru
Requires-Dist: requests
Requires-Dist: setuptools==69.5
Requires-Dist: transformers
Requires-Dist: tokenizers
Requires-Dist: flash-attn==2.5.8
Requires-Dist: bitsandbytes==0.43.1
Requires-Dist: safetensors
Requires-Dist: gpustat
Requires-Dist: icecream
Requires-Dist: pdf2image
Requires-Dist: numpy==1.26.4
Requires-Dist: torchvision
Requires-Dist: jsonlines
Requires-Dist: editdistance
Requires-Dist: einops
Requires-Dist: fire
Requires-Dist: peft
Requires-Dist: timm
Requires-Dist: sentencepiece
Requires-Dist: colpali-engine==0.3.1
Requires-Dist: easyocr
Requires-Dist: qwen-vl-utils
Requires-Dist: faiss-cpu
Requires-Dist: word2number
Requires-Dist: datasets>=3.0.0
Requires-Dist: python-dotenv
# M3DocRAG
Code for [M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding](https://m3docrag.github.io/)
by [Jaemin Cho](https://j-min.io/), [Debanjan Mahata](https://sites.google.com/a/ualr.edu/debanjan-mahata/), [Ozan İrsoy](https://wtimesx.com/), [Yujie He](https://scholar.google.com/citations?user=FbeAZGgAAAAJ&hl=en), [Mohit Bansal](https://www.cs.unc.edu/~mbansal/)
# Summary
## Comparison with previous approches
<img src='./assets/m3docrag_teaser.png' >
Comparison of multi-modal document understanding pipelines. Previous works focus on (a) **Single-page DocVQA** that cannot handle many long documents or (b) **Text-based RAG** that ignores visual information. Our (c) **M3DocRAG** framework retrieves relevant documents and answers questions using multi-modal retrieval and MLM components, so that it can efficiently handle many long documents while preserving visual information.
## M3DocRAG framework
<img src='./assets/method.png' >
Our **M3DocRAG** framework consists of three stages: (1) document embedding, (2) page retrieval, and (3) question answering.
- In (1) document embedding, we extract visual embedding (with ColPali) to represent each page from all PDF documents.
- In (2) page retrieval, we retrieve the top-K pages of high relevance (MaxSim scores) with text queries. In an open-domain setting, we create approximate page indices for faster search.
- In (3) question answering, we conduct visual question answering with multi-modal LM (e.g. Qwen2-VL) to obtain the final answer.
# Setup
## Package
We assume conda has been installed
```bash
git clone <REPO_URL>
cd m3docrag-release
pip install -e .
# Install Poppler (for pdf2image; check https://pdf2image.readthedocs.io/en/latest/installation.html for details)
# conda install -y poppler
# or
# apt-get install poppler-utils
```
## Code structure
```bash
examples/ # scripts to run PDF embedding / RAG
src/m3docrag/
datasets/ # data loader for existing datasets
retrieval/ # retrieval model (e.g., ColPaLi)
vqa/ # vqa model (e.g., Qwen2-VL)
rag/ # RAG model that combines retrieval and vqa models
utils/ # misc utility methods
m3docvqa/ # how to setup m3docvqa dataset
```
## Paths: Data, Embeddings, Model checkpoints, Outputs
```bash
# in .env
LOCAL_DATA_DIR="/job/datasets" # where to store data
LOCAL_EMBEDDINGS_DIR="/job/embeddings" # where to store embeddings
LOCAL_MODEL_DIR="/job/model" # where to store model checkpoints
LOCAL_OUTPUT_DIR="/job/output" # where to store model outputs
```
You can adjust variables in [`.env`](.env) to change where to store data/embedding/model checkpoint/outputs by default. They are loaded in [`src/m3docrag/utils/paths.py`](./src/m3docrag/utils/paths.py) via [python-dotenv](https://github.com/theskumar/python-dotenv).
## Download M3DocVQA dataset
Please see [m3docvqa/README.md](m3docvqa/README.md) for the download instruction.
## Donwload model checkpoints
By default, we use colpali-v1.2 for retrival and Qwen2-VL-7B-Instruct for question answering.
At `$LOCAL_MODEL_DIR`, download [colpali-v1.2](https://huggingface.co/vidore/colpali-v1.2), [colpaligemma-3b-mix-448-base](https://huggingface.co/vidore/colpaligemma-3b-mix-448-base) and [Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) checkpoints.
```bash
cd $LOCAL_MODEL_DIR
git clone https://huggingface.co/vidore/colpaligemma-3b-pt-448-base # ColPali backbone
git clone https://huggingface.co/vidore/colpali-v1.2 # ColPali adapter
git clone https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct # VQA
```
# Example usage
Below we describe example usage of M3DocRAG on M3DocVQA dataset.
## 1. Extract PDF embeddings
```bash
DATASET_NAME="m3-docvqa"
RETRIEVAL_MODEL_TYPE="colpali"
RETRIEVAL_MODEL_NAME="colpaligemma-3b-pt-448-base"
RETRIEVAL_ADAPTER_MODEL_NAME="colpali-v1.2"
SPLIT="dev"
EMBEDDING_NAME=$RETRIEVAL_ADAPTER_MODEL_NAME"_"$DATASET_NAME"_"$SPLIT # where to save embeddings
accelerate launch --num_processes=1 --mixed_precision=bf16 examples/run_page_embedding.py \
--use_retrieval \
--retrieval_model_type=$RETRIEVAL_MODEL_TYPE \
--data_name=$DATASET_NAME \
--split=$SPLIT \
--loop_unique_doc_ids=True \
--output_dir=/job/embeddings/$EMBEDDING_NAME \
--retrieval_model_name_or_path=$RETRIEVAL_MODEL_NAME \
--retrieval_adapter_model_name_or_path=$RETRIEVAL_ADAPTER_MODEL_NAME
```
## 2. Indexing
```bash
DATASET_NAME="m3-docvqa"
RETRIEVAL_MODEL_TYPE="colpali"
RETRIEVAL_ADAPTER_MODEL_NAME="colpali-v1.2"
SPLIT="dev"
FAISS_INDEX_TYPE='ivfflat'
EMBEDDING_NAME=$RETRIEVAL_ADAPTER_MODEL_NAME"_"$DATASET_NAME"_"$SPLIT
INDEX_NAME=$EMBEDDING_NAME"_pageindex_"$FAISS_INDEX_TYPE # where to save resulting index
echo $EMBEDDING_NAME
echo $FAISS_INDEX_TYPE
python examples/run_indexing_m3docvqa.py \
--use_retrieval \
--retrieval_model_type=$RETRIEVAL_MODEL_TYPE \
--data_name=$DATASET_NAME \
--split=$SPLIT \
--loop_unique_doc_ids=False \
--embedding_name=$EMBEDDING_NAME \
--faiss_index_type=$FAISS_INDEX_TYPE \
--output_dir=/job/embeddings/$INDEX_NAME
```
## 3. RAG
```bash
BACKBONE_MODEL_NAME="Qwen2-VL-7B-Instruct"
RETRIEVAL_MODEL_TYPE="colpali"
RETRIEVAL_MODEL_NAME="colpaligemma-3b-pt-448-base"
RETRIEVAL_ADAPTER_MODEL_NAME="colpali-v1.2"
EMBEDDING_NAME="colpali-v1.2_m3-docvqa_dev" # from Step 1 Embedding
SPLIT="dev"
DATASET_NAME="m3-docvqa"
FAISS_INDEX_TYPE='ivfflat'
N_RETRIEVAL_PAGES=1
INDEX_NAME="${EMBEDDING_NAME}_pageindex_$FAISS_INDEX_TYPE" # from Step 2 Indexing
OUTPUT_SAVE_NAME="${RETRIEVAL_ADAPTER_MODEL_NAME}_${BACKBONE_MODEL_NAME}_${DATASET_NAME}" # where to save RAG results
BITS=16 # BITS=4 for 4-bit qunaitzation in low memory GPUs
python examples/run_rag_m3docvqa.py \
--use_retrieval \
--retrieval_model_type=$RETRIEVAL_MODEL_TYPE \
--load_embedding=True \
--split=$SPLIT \
--bits=$BITS \
--n_retrieval_pages=$N_RETRIEVAL_PAGES \
--data_name=$DATASET_NAME \
--model_name_or_path=$BACKBONE_MODEL_NAME \
--embedding_name=$EMBEDDING_NAME \
--retrieval_model_name_or_path=$RETRIEVAL_MODEL_NAME \
--retrieval_adapter_model_name_or_path=$RETRIEVAL_ADAPTER_MODEL_NAME \
--output_dir=/job/eval_outputs/$OUTPUT_SAVE_NAME
```
# Citation
Please cite our paper if you use our dataset and/or method in your projects.
```bibtex
@article{Cho2024M3DocRAG,
author = {Jaemin Cho and Ozan İrsoy and Debanjan Mahata and Yujie He and Mohit Bansal},
title = {M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding},
year = {2024},
}
```

View File

@ -1,31 +0,0 @@
README.md
pyproject.toml
src/m3docrag/__init__.py
src/m3docrag.egg-info/PKG-INFO
src/m3docrag.egg-info/SOURCES.txt
src/m3docrag.egg-info/dependency_links.txt
src/m3docrag.egg-info/requires.txt
src/m3docrag.egg-info/top_level.txt
src/m3docrag/datasets/__init__.py
src/m3docrag/datasets/m3_docvqa/__init__.py
src/m3docrag/datasets/m3_docvqa/common_utils.py
src/m3docrag/datasets/m3_docvqa/dataset.py
src/m3docrag/datasets/m3_docvqa/evaluate.py
src/m3docrag/rag/__init__.py
src/m3docrag/rag/base.py
src/m3docrag/rag/multimodal.py
src/m3docrag/rag/utils.py
src/m3docrag/retrieval/__init__.py
src/m3docrag/retrieval/colpali.py
src/m3docrag/utils/args.py
src/m3docrag/utils/distributed.py
src/m3docrag/utils/paths.py
src/m3docrag/utils/pdfs.py
src/m3docrag/utils/prompts.py
src/m3docrag/utils/tar.py
src/m3docrag/vqa/__init__.py
src/m3docrag/vqa/florence2.py
src/m3docrag/vqa/idefics2.py
src/m3docrag/vqa/idefics3.py
src/m3docrag/vqa/internvl2.py
src/m3docrag/vqa/qwen2.py

View File

@ -1 +0,0 @@

View File

@ -1,28 +0,0 @@
accelerate==1.1.0
loguru
requests
setuptools==69.5
transformers
tokenizers
flash-attn==2.5.8
bitsandbytes==0.43.1
safetensors
gpustat
icecream
pdf2image
numpy==1.26.4
torchvision
jsonlines
editdistance
einops
fire
peft
timm
sentencepiece
colpali-engine==0.3.1
easyocr
qwen-vl-utils
faiss-cpu
word2number
datasets>=3.0.0
python-dotenv

View File

@ -1 +0,0 @@
m3docrag