Spaces:
Running
on
Zero
Running
on
Zero
Bellok
commited on
Commit
·
58c8726
1
Parent(s):
6abf740
refactor: reduce arxiv dataset limit to 50k for deployment efficiency
Browse files- Lowered arxiv_limit from 100,000 to 50,000 to balance paper coverage with faster deployment times
- Updated comments to reflect the new balanced capacity, ensuring sufficient knowledge base without excessive ingestion duration
app.py
CHANGED
|
@@ -99,8 +99,8 @@ if len(documents) == 0:
|
|
| 99 |
try:
|
| 100 |
print(f"📦 Downloading {dataset} (timeout: 3 minutes)...")
|
| 101 |
|
| 102 |
-
#
|
| 103 |
-
arxiv_limit =
|
| 104 |
|
| 105 |
success = ingestor.ingest_dataset(dataset, arxiv_limit=arxiv_limit)
|
| 106 |
if success:
|
|
|
|
| 99 |
try:
|
| 100 |
print(f"📦 Downloading {dataset} (timeout: 3 minutes)...")
|
| 101 |
|
| 102 |
+
# Balance between coverage and deployment time - 50k arxiv papers plus all other packs
|
| 103 |
+
arxiv_limit = 50000 if dataset == "arxiv" else None # Balanced capacity
|
| 104 |
|
| 105 |
success = ingestor.ingest_dataset(dataset, arxiv_limit=arxiv_limit)
|
| 106 |
if success:
|