Back to AI is open source
QwenLong-L1.5 vs RAG: Comparison of Long Document Multi-hop Inference, Data Synthesis and Memory Proxy

QwenLong-L1.5 vs RAG: Comparison of Long Document Multi-hop Inference, Data Synthesis and Memory Proxy

AI is open source Admin 80 views

1. Abstract

QwenLong-L1.5 is a set of "long context reasoning + memory management" post-training recipe that the Tongyi Zhiwen team open-sourced in the Qwen-Doc repository. It revolves around three things: complex inference data synthesis for long documents, reinforcement learning stable training methods (AEPO, etc.) for long sequences, and memory management frameworks that still work outside the physical context window, and the corresponding model QwenLong-L1.5-30B-A3B (based on Qwen3-30B-A3B-Thinking) was released.

2. Core features

  1. Long context inference data synthesis: Through the method of "atomic fact decomposition + verifiable combination", long document reasoning samples that require multi-hop evidence concatenation are generated, not just simple retrieval tasks.
  2. Long-sequence RL stable training: Strategies such as task balanced sampling are introduced, and AEPO (Adaptive Entropy-Controlled Policy Optimization) is proposed to regulate the training process with entropy-related mechanisms to alleviate the common instability problems of long-context RL.
  3. Memory management and ultra-long input: Perform a single inference in the model's physical window (256K window mentioned in the example), and combine it with iterative memory updates to expand the processing range to millions or even longer input streams (the paper description covers 1M-4M token-level tasks).
  4. Open source reproducible: Provide model weights and supporting dependency descriptions to facilitate researchers to reproduce experiments or do secondary development (the model license is Apache-2.0, the specific repository/model card shall prevail).

3. Installation

1. Create an environment (example): conda create -n qwenlongl1_5 python==3.10 && conda activate qwenlongl1_5

2. Install dependencies: Execute pip3 install -r requirements.txt in the corresponding directory (subject to the actual file).

3. Install the RL training library: Install the verl according to the project's recommendation (example is cloning volcengine/verl and cutting to the pip3 install -e . after v0.4).

  1. Inference side dependencies: Use Transformers to load models and tokenizers (you can also adjust device_map, dtype, etc. according to your inference framework).

4. Typical use cases

  1. Long document Q&A (DocQA): Cross-paragraph multi-hop reasoning and answer attribution on technical documents, compliance materials, and papers/reports.
  2. "Read and answer" ultra-long materials: When the input scale exceeds a single context, the memory agent process is used for segmented reading, memory update and final comprehensive answer.
  3. Enterprise knowledge analysis: Structural key point extraction, conflict detection and consistency check on annual reports, bidding documents, and demand documents.
  4. Research reproduction and training practice: used to explore the sampling strategy, reward design, training stability and evaluation system of long-context RL.

5. Ecology and competing products

  1. Same repository ecosystem: Qwen-Doc also includes QwenLong-L1 (earlier long-context RL exploration) and SPELL (self-game RL framework), which is suitable for comparative experiments on the "data-training-agent" full link.
  2. Relationship with RAG/compression scheme: RAG is more "retrieval hit rate and context splicing", while QwenLong-L1.5 emphasizes "reasoning ability and memory process after reading long text"; The two can be combined in engineering (retrieval first, then long reasoning/memory summary).
  3. Competitive product reference: closed-source long-context models and various open-source long-context fine-tuning/sparse attention/compression methods have their own trade-offs; The difference of QwenLong-L1.5 is that "long inference data synthesis + long sequence RL stable training + memory agent" is given as a set of post-training formulas.

6. Limitations and precautions

  1. Computing power and latency: Long sequence inference and RL training are more memory/throughput, especially in 256K-level windows or memory proxy loops, the cost will increase significantly.
  2. Memory is not "absolutely correct": memory update may introduce omissions and deviations, and it is recommended to retain evidence traceability and manual review mechanisms in key scenarios.
  3. Training reproduction threshold: RL's rewards, sampling, and superparameters are sensitive to results. Different clusters/inference backends can also affect stability.
  4. Evaluate extrapolation risks: Benchmark improvement does not mean that all real document tasks are improved, and domain data regression and security assessment should be done before implementation.

7. Project address

https://github.com/Tongyi-Zhiwen/Qwen-Doc/tree/main/QwenLong-L1.5

8. Frequently asked questions

Q: What problem does QwenLong-L1.5 solve?

A: Mainly for long document tasks of "cross-chapter, multi-evidence, multi-hop reasoning", the goal is to enable the model to not only retrieve fragments, but also complete chain reasoning and consistency judgment in a long range.

Q: What is the AEPO of QwenLong-L1.5 and how does it relate to common PPOs?

A: AEPO is one of the strategy optimization methods designed for long-context training stability, which regulates the exploration and update intensity through entropy-related mechanisms. It belongs to the same strategy optimization paradigm as PPO, but the implementation details and stabilization methods are different (subject to the paper and code implementation).

Q: How long does the QwenLong-L1.5-30B-A3B need to use the context window?

A: The model works in a combination of "physical window + memory mechanism"; The example material mentions doing a single inference in a 256K window and can be extended to longer inputs with a memory proxy. The actual available length depends on the inference framework, memory, and configuration.

Q: I just want to do reasoning, not training, how can I get started with QwenLong-L1.5 the fastest?

A: Directly use Transformers to load weights and tokenizers from the model warehouse, and prepare long text and question prompts. To reproduce the memory proxy process, refer to the project supporting script and paper description.

Q: Should I choose between QwenLong-L1.5 and RAG?

A: Not necessarily. RAG solves the problem of "finding", and QwenLong-L1.5 emphasizes "reading and understanding, pushing far, and remembering"; A common combination in engineering practice is "retrieval narrowing + long reasoning/memory summarization to complete complex questions and answers".

Analysis of QwenLong-L1.5 open-source long-context inference scheme Tongyi Zhiwen released the training formula after QwenLong-L1.5 How QwenLong-L1.5 implements multi-hop inference for long documents Detailed explanation of QwenLong-L1.5 long inference data synthesis method QwenLong-L1.5 atomic fact decomposition generates data QwenLong-L1.5 can verify how the combined sample works QwenLong-L1.5 long-sequence RL stability training revealed QwenLong-L1.5 proposed an AEPO stabilization strategy Interpretation of AEPO and PPO differences between QwenLong-L1.5 Analysis of QwenLong-L1.5 task balanced sampling strategy How QwenLong-L1.5 relieves long RL instability How the QwenLong-L1.5 memory management framework works How does QwenLong-L1.5 go beyond the physical window inference? QwenLong-L1.5 iteratively expands inputs with memory QwenLong-L1.5 supports million-level input stream processing QwenLong-L1.5 covers 1M to 4M mission descriptions QwenLong-L1.5 can reach a single inference window of 256K QwenLong-L1.5-30B-A3B model release introduction QwenLong-L1.5 is built on Qwen3-30B-A3B QwenLong-L1.5 model weights and dependencies QwenLong-L1.5 open source reproducible value interpretation QwenLong-L1.5 adopts the Apache-2.0 license QwenLong-L1.5 Installation and Environment Configuration Guide QwenLong-L1.5 dependencies install requirements points How to install verl training library in QwenLong-L1.5 QwenLong-L1.5 inference side Transformer loading tutorial QwenLong-L1.5 Long Document Q&A DocQA Usage QwenLong-L1.5 memory process after reading and answering QwenLong-L1.5 enterprise knowledge analysis landing scenario QwenLong-L1.5 is used for collision detection and consistency checking QwenLong-L1.5 is used to study reproduction training practices QwenLong-L1.5 positioning in the Qwen-Doc ecosystem The differences between QwenLong-L1 and QwenLong-L1.5 are sorted out How do the SPELL frameworks in Qwen-Doc compare? QwenLong-L1.5 and RAG relationship and trade-off How QwenLong-L1.5 is combined with retrieval to enhance RAG Advantages of QwenLong-L1.5 over compression solutions QwenLong-L1.5 for closed-source long contexts QwenLong-L1.5 long reasoning three-piece set highlights interpretation QwenLong-L1.5 Computing Power and Latency Cost Evaluation QwenLong-L1.5 long-window inference video memory pressure analysis QwenLong-L1.5 memory update deviation risk warning QwenLong-L1.5 training reproduction threshold and hyperparameter sensitivity QwenLong-L1.5 evaluates how to deal with extrapolation risks QwenLong-L1.5 regression evaluation suggestions before landing QwenLong-L1.5 Project Address & Quick Entrance QwenLong-L1.5 FAQ and Key Points Summary

Recommended Tools

More