no longer tracking egg-info

2025-07-09 15:39:31 +00:00
parent 29e6ac2294
commit 2ab91099f2
5 changed files with 0 additions and 276 deletions
--- a/src/m3docrag.egg-info/PKG-INFO
+++ b/src/m3docrag.egg-info/PKG-INFO
@@ -1,215 +0,0 @@
 Metadata-Version: 2.2
 Name: m3docrag
 Version: 0.0.1
 Summary: Multimodal Document Understanding with RAG
 Classifier: Programming Language :: Python :: 3
 Requires-Python: >=3.10
 Description-Content-Type: text/markdown
 Requires-Dist: accelerate==1.1.0
 Requires-Dist: loguru
 Requires-Dist: requests
 Requires-Dist: setuptools==69.5
 Requires-Dist: transformers
 Requires-Dist: tokenizers
 Requires-Dist: flash-attn==2.5.8
 Requires-Dist: bitsandbytes==0.43.1
 Requires-Dist: safetensors
 Requires-Dist: gpustat
 Requires-Dist: icecream
 Requires-Dist: pdf2image
 Requires-Dist: numpy==1.26.4
 Requires-Dist: torchvision
 Requires-Dist: jsonlines
 Requires-Dist: editdistance
 Requires-Dist: einops
 Requires-Dist: fire
 Requires-Dist: peft
 Requires-Dist: timm
 Requires-Dist: sentencepiece
 Requires-Dist: colpali-engine==0.3.1
 Requires-Dist: easyocr
 Requires-Dist: qwen-vl-utils
 Requires-Dist: faiss-cpu
 Requires-Dist: word2number
 Requires-Dist: datasets>=3.0.0
 Requires-Dist: python-dotenv
 # M3DocRAG
 Code for [M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding](https://m3docrag.github.io/)
 by [Jaemin Cho](https://j-min.io/), [Debanjan Mahata](https://sites.google.com/a/ualr.edu/debanjan-mahata/), [Ozan İrsoy](https://wtimesx.com/), [Yujie He](https://scholar.google.com/citations?user=FbeAZGgAAAAJ&hl=en), [Mohit Bansal](https://www.cs.unc.edu/~mbansal/)
 # Summary
 ## Comparison with previous approches
 <img src='./assets/m3docrag_teaser.png' >
 Comparison of multi-modal document understanding pipelines. Previous works focus on (a) **Single-page DocVQA** that cannot handle many long documents or (b) **Text-based RAG** that ignores visual information. Our (c) **M3DocRAG** framework retrieves relevant documents and answers questions using multi-modal retrieval and MLM components, so that it can efficiently handle many long documents while preserving visual information.
 ## M3DocRAG framework
 <img src='./assets/method.png' >
 Our **M3DocRAG** framework consists of three stages: (1) document embedding, (2) page retrieval, and (3) question answering.
 - In (1) document embedding, we extract visual embedding (with ColPali) to represent each page from all PDF documents.
 - In (2) page retrieval, we retrieve the top-K pages of high relevance (MaxSim scores) with text queries. In an open-domain setting, we create approximate page indices for faster search.
 - In (3) question answering, we conduct visual question answering with multi-modal LM (e.g. Qwen2-VL) to obtain the final answer.
 # Setup
 ## Package
 We assume conda has been installed
 ```bash
 git clone <REPO_URL>
 cd m3docrag-release
 pip install -e .
 # Install Poppler (for pdf2image; check https://pdf2image.readthedocs.io/en/latest/installation.html for details)
 # conda install -y poppler
 # or
 # apt-get install poppler-utils
 ```
 ## Code structure
 ```bash
 examples/               # scripts to run PDF embedding / RAG
 src/m3docrag/
    datasets/           # data loader for existing datasets
    retrieval/          # retrieval model (e.g., ColPaLi)
    vqa/                # vqa model (e.g., Qwen2-VL)
    rag/                # RAG model that combines retrieval and vqa models
    utils/              # misc utility methods
 m3docvqa/               # how to setup m3docvqa dataset
 ```
 ## Paths: Data, Embeddings, Model checkpoints, Outputs
 ```bash
 # in .env
 LOCAL_DATA_DIR="/job/datasets" # where to store data
 LOCAL_EMBEDDINGS_DIR="/job/embeddings" # where to store embeddings
 LOCAL_MODEL_DIR="/job/model" # where to store model checkpoints
 LOCAL_OUTPUT_DIR="/job/output" # where to store model outputs
 ```
 You can adjust variables in [`.env`](.env) to change where to store data/embedding/model checkpoint/outputs by default. They are loaded in [`src/m3docrag/utils/paths.py`](./src/m3docrag/utils/paths.py) via [python-dotenv](https://github.com/theskumar/python-dotenv).
 ## Download M3DocVQA dataset
 Please see [m3docvqa/README.md](m3docvqa/README.md) for the download instruction.
 ## Donwload model checkpoints
 By default, we use colpali-v1.2 for retrival and Qwen2-VL-7B-Instruct for question answering.
 At `$LOCAL_MODEL_DIR`, download [colpali-v1.2](https://huggingface.co/vidore/colpali-v1.2), [colpaligemma-3b-mix-448-base](https://huggingface.co/vidore/colpaligemma-3b-mix-448-base) and [Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) checkpoints.
 ```bash
 cd $LOCAL_MODEL_DIR
 git clone https://huggingface.co/vidore/colpaligemma-3b-pt-448-base # ColPali backbone
 git clone https://huggingface.co/vidore/colpali-v1.2 # ColPali adapter
 git clone https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct # VQA
 ```
 # Example usage
 Below we describe example usage of M3DocRAG on M3DocVQA dataset.
 ## 1. Extract PDF embeddings
 ```bash
 DATASET_NAME="m3-docvqa"
 RETRIEVAL_MODEL_TYPE="colpali"
 RETRIEVAL_MODEL_NAME="colpaligemma-3b-pt-448-base"
 RETRIEVAL_ADAPTER_MODEL_NAME="colpali-v1.2"
 SPLIT="dev"
 EMBEDDING_NAME=$RETRIEVAL_ADAPTER_MODEL_NAME"_"$DATASET_NAME"_"$SPLIT  # where to save embeddings
 accelerate launch --num_processes=1 --mixed_precision=bf16 examples/run_page_embedding.py \
    --use_retrieval \
    --retrieval_model_type=$RETRIEVAL_MODEL_TYPE \
    --data_name=$DATASET_NAME \
    --split=$SPLIT \
    --loop_unique_doc_ids=True \
    --output_dir=/job/embeddings/$EMBEDDING_NAME \
    --retrieval_model_name_or_path=$RETRIEVAL_MODEL_NAME \
    --retrieval_adapter_model_name_or_path=$RETRIEVAL_ADAPTER_MODEL_NAME    
 ```
 ## 2. Indexing
 ```bash
 DATASET_NAME="m3-docvqa"
 RETRIEVAL_MODEL_TYPE="colpali"
 RETRIEVAL_ADAPTER_MODEL_NAME="colpali-v1.2"
 SPLIT="dev"
 FAISS_INDEX_TYPE='ivfflat'
 EMBEDDING_NAME=$RETRIEVAL_ADAPTER_MODEL_NAME"_"$DATASET_NAME"_"$SPLIT
 INDEX_NAME=$EMBEDDING_NAME"_pageindex_"$FAISS_INDEX_TYPE # where to save resulting index
 echo $EMBEDDING_NAME
 echo $FAISS_INDEX_TYPE
 python examples/run_indexing_m3docvqa.py \
    --use_retrieval \
    --retrieval_model_type=$RETRIEVAL_MODEL_TYPE \
    --data_name=$DATASET_NAME \
    --split=$SPLIT \
    --loop_unique_doc_ids=False \
    --embedding_name=$EMBEDDING_NAME \
    --faiss_index_type=$FAISS_INDEX_TYPE \
    --output_dir=/job/embeddings/$INDEX_NAME
 ```
 ## 3. RAG
 ```bash
 BACKBONE_MODEL_NAME="Qwen2-VL-7B-Instruct"
 RETRIEVAL_MODEL_TYPE="colpali"
 RETRIEVAL_MODEL_NAME="colpaligemma-3b-pt-448-base"
 RETRIEVAL_ADAPTER_MODEL_NAME="colpali-v1.2"
 EMBEDDING_NAME="colpali-v1.2_m3-docvqa_dev" # from Step 1 Embedding
 SPLIT="dev"
 DATASET_NAME="m3-docvqa"
 FAISS_INDEX_TYPE='ivfflat'
 N_RETRIEVAL_PAGES=1
 INDEX_NAME="${EMBEDDING_NAME}_pageindex_$FAISS_INDEX_TYPE" # from Step 2 Indexing
 OUTPUT_SAVE_NAME="${RETRIEVAL_ADAPTER_MODEL_NAME}_${BACKBONE_MODEL_NAME}_${DATASET_NAME}" # where to save RAG results
 BITS=16 # BITS=4 for 4-bit qunaitzation in low memory GPUs
 python examples/run_rag_m3docvqa.py \
    --use_retrieval \
    --retrieval_model_type=$RETRIEVAL_MODEL_TYPE \
    --load_embedding=True \
    --split=$SPLIT \
    --bits=$BITS \
    --n_retrieval_pages=$N_RETRIEVAL_PAGES \
    --data_name=$DATASET_NAME \
    --model_name_or_path=$BACKBONE_MODEL_NAME \
    --embedding_name=$EMBEDDING_NAME \
    --retrieval_model_name_or_path=$RETRIEVAL_MODEL_NAME \
    --retrieval_adapter_model_name_or_path=$RETRIEVAL_ADAPTER_MODEL_NAME \
    --output_dir=/job/eval_outputs/$OUTPUT_SAVE_NAME
 ```
 # Citation
 Please cite our paper if you use our dataset and/or method in your projects.
 ```bibtex
@article{Cho2024M3DocRAG,
  author    = {Jaemin Cho and Ozan İrsoy and Debanjan Mahata and Yujie He and Mohit Bansal},
  title     = {M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding},
  year      = {2024},
 }
 ```
--- a/src/m3docrag.egg-info/SOURCES.txt
+++ b/src/m3docrag.egg-info/SOURCES.txt
@@ -1,31 +0,0 @@
 README.md
 pyproject.toml
 src/m3docrag/__init__.py
 src/m3docrag.egg-info/PKG-INFO
 src/m3docrag.egg-info/SOURCES.txt
 src/m3docrag.egg-info/dependency_links.txt
 src/m3docrag.egg-info/requires.txt
 src/m3docrag.egg-info/top_level.txt
 src/m3docrag/datasets/__init__.py
 src/m3docrag/datasets/m3_docvqa/__init__.py
 src/m3docrag/datasets/m3_docvqa/common_utils.py
 src/m3docrag/datasets/m3_docvqa/dataset.py
 src/m3docrag/datasets/m3_docvqa/evaluate.py
 src/m3docrag/rag/__init__.py
 src/m3docrag/rag/base.py
 src/m3docrag/rag/multimodal.py
 src/m3docrag/rag/utils.py
 src/m3docrag/retrieval/__init__.py
 src/m3docrag/retrieval/colpali.py
 src/m3docrag/utils/args.py
 src/m3docrag/utils/distributed.py
 src/m3docrag/utils/paths.py
 src/m3docrag/utils/pdfs.py
 src/m3docrag/utils/prompts.py
 src/m3docrag/utils/tar.py
 src/m3docrag/vqa/__init__.py
 src/m3docrag/vqa/florence2.py
 src/m3docrag/vqa/idefics2.py
 src/m3docrag/vqa/idefics3.py
 src/m3docrag/vqa/internvl2.py
 src/m3docrag/vqa/qwen2.py
--- a/src/m3docrag.egg-info/dependency_links.txt
+++ b/src/m3docrag.egg-info/dependency_links.txt
@@ -1 +0,0 @@
--- a/src/m3docrag.egg-info/requires.txt
+++ b/src/m3docrag.egg-info/requires.txt
@@ -1,28 +0,0 @@
 accelerate==1.1.0
 loguru
 requests
 setuptools==69.5
 transformers
 tokenizers
 flash-attn==2.5.8
 bitsandbytes==0.43.1
 safetensors
 gpustat
 icecream
 pdf2image
 numpy==1.26.4
 torchvision
 jsonlines
 editdistance
 einops
 fire
 peft
 timm
 sentencepiece
 colpali-engine==0.3.1
 easyocr
 qwen-vl-utils
 faiss-cpu
 word2number
 datasets>=3.0.0
 python-dotenv
--- a/src/m3docrag.egg-info/top_level.txt
+++ b/src/m3docrag.egg-info/top_level.txt
@@ -1 +0,0 @@
 m3docrag