Perturbation datasets used to train the Hubble models, covering three risk domains and five data types.
AI & ML interests
None defined yet.
Recent Activity
View all activity
Three models trained on only copyright, privacy, or test set contamination perturbations to measure interference between domains.
-
allegrolab/hubble-1b-100b_toks-interference_copyright-hf
Text Generation • 1B • Updated • 9 -
allegrolab/hubble-1b-100b_toks-interference_privacy-hf
Text Generation • 1B • Updated • 11 -
allegrolab/hubble-1b-100b_toks-interference_testset-hf
Text Generation • 1B • Updated • 9 -
allegrolab/hubble-1b-100b_toks-interference_copyright-neox
Text Generation • Updated
Two models trained on paraphrased YAGO biographies and MMLU test sets to study how paraphrased information is memorized.
-
allegrolab/hubble-8b-100b_toks-paraphrased-perturbed-hf
Text Generation • 8B • Updated • 9 -
allegrolab/hubble-1b-100b_toks-paraphrased-perturbed-hf
Text Generation • 1B • Updated • 11 -
allegrolab/dclm-baseline-500b_toks
Updated • 274 -
allegrolab/hubble-8b-100b_toks-paraphrased-perturbed-neox
Text Generation • Updated
Eight models that vary in size, data condition, and corpus scale to establish dilution effects in memorization.
-
allegrolab/hubble-8b-500b_toks-perturbed-hf
Text Generation • 8B • Updated • 26 • 1 -
allegrolab/hubble-8b-500b_toks-standard-hf
Text Generation • 8B • Updated • 23 • 1 -
allegrolab/hubble-8b-100b_toks-perturbed-hf
Text Generation • 8B • Updated • 138 -
allegrolab/hubble-8b-100b_toks-standard-hf
Text Generation • 8B • Updated • 13
Six models trained with perturbations inserted at different stages of training to study how exposure timing affects memorization.
-
allegrolab/hubble-1b-100b_toks-injectrange_0_50-hf
Text Generation • 1B • Updated • 27 -
allegrolab/hubble-1b-100b_toks-injectrange_50_100-hf
Text Generation • 1B • Updated • 17 -
allegrolab/hubble-1b-100b_toks-injectrange_0_25-hf
Text Generation • 1B • Updated • 23 -
allegrolab/hubble-1b-100b_toks-injectrange_25_50-hf
Text Generation • 1B • Updated • 594
Two models trained with shallower and deeper transformer architectures to assess how model depth affects memorization.
-
allegrolab/hubble-1b-100b_toks-double_depth-perturbed-hf
Text Generation • 1B • Updated • 7 -
allegrolab/hubble-1b-100b_toks-double_depth-standard-hf
Text Generation • 1B • Updated • 7 -
allegrolab/hubble-1b-100b_toks-half_depth-perturbed-hf
Text Generation • 1B • Updated • 6 -
allegrolab/hubble-1b-100b_toks-half_depth-standard-hf
Text Generation • 1B • Updated • 8
Perturbation datasets used to train the Hubble models, covering three risk domains and five data types.
Eight models that vary in size, data condition, and corpus scale to establish dilution effects in memorization.
-
allegrolab/hubble-8b-500b_toks-perturbed-hf
Text Generation • 8B • Updated • 26 • 1 -
allegrolab/hubble-8b-500b_toks-standard-hf
Text Generation • 8B • Updated • 23 • 1 -
allegrolab/hubble-8b-100b_toks-perturbed-hf
Text Generation • 8B • Updated • 138 -
allegrolab/hubble-8b-100b_toks-standard-hf
Text Generation • 8B • Updated • 13
Three models trained on only copyright, privacy, or test set contamination perturbations to measure interference between domains.
-
allegrolab/hubble-1b-100b_toks-interference_copyright-hf
Text Generation • 1B • Updated • 9 -
allegrolab/hubble-1b-100b_toks-interference_privacy-hf
Text Generation • 1B • Updated • 11 -
allegrolab/hubble-1b-100b_toks-interference_testset-hf
Text Generation • 1B • Updated • 9 -
allegrolab/hubble-1b-100b_toks-interference_copyright-neox
Text Generation • Updated
Six models trained with perturbations inserted at different stages of training to study how exposure timing affects memorization.
-
allegrolab/hubble-1b-100b_toks-injectrange_0_50-hf
Text Generation • 1B • Updated • 27 -
allegrolab/hubble-1b-100b_toks-injectrange_50_100-hf
Text Generation • 1B • Updated • 17 -
allegrolab/hubble-1b-100b_toks-injectrange_0_25-hf
Text Generation • 1B • Updated • 23 -
allegrolab/hubble-1b-100b_toks-injectrange_25_50-hf
Text Generation • 1B • Updated • 594
Two models trained on paraphrased YAGO biographies and MMLU test sets to study how paraphrased information is memorized.
-
allegrolab/hubble-8b-100b_toks-paraphrased-perturbed-hf
Text Generation • 8B • Updated • 9 -
allegrolab/hubble-1b-100b_toks-paraphrased-perturbed-hf
Text Generation • 1B • Updated • 11 -
allegrolab/dclm-baseline-500b_toks
Updated • 274 -
allegrolab/hubble-8b-100b_toks-paraphrased-perturbed-neox
Text Generation • Updated
Two models trained with shallower and deeper transformer architectures to assess how model depth affects memorization.
-
allegrolab/hubble-1b-100b_toks-double_depth-perturbed-hf
Text Generation • 1B • Updated • 7 -
allegrolab/hubble-1b-100b_toks-double_depth-standard-hf
Text Generation • 1B • Updated • 7 -
allegrolab/hubble-1b-100b_toks-half_depth-perturbed-hf
Text Generation • 1B • Updated • 6 -
allegrolab/hubble-1b-100b_toks-half_depth-standard-hf
Text Generation • 1B • Updated • 8