Delete conversion_to_hf.log
Browse files- conversion_to_hf.log +0 -241
conversion_to_hf.log
DELETED
|
@@ -1,241 +0,0 @@
|
|
| 1 |
-
Loaded loader_megatron_core as the loader.
|
| 2 |
-
Loaded saver_llama2_hf_bf as the saver.
|
| 3 |
-
Starting saver...
|
| 4 |
-
Starting loader...
|
| 5 |
-
fused_indices_to_multihot has reached end of life. Please migrate to a non-experimental function.
|
| 6 |
-
/usr/local/lib/python3.12/dist-packages/modelopt/torch/utils/import_utils.py:31: UserWarning: Failed to import apex plugin due to: AttributeError("module 'transformers.modeling_utils' has no attribute 'Conv1D'"). You may ignore this warning if you do not need this plugin.
|
| 7 |
-
warnings.warn(
|
| 8 |
-
/usr/local/lib/python3.12/dist-packages/modelopt/torch/utils/import_utils.py:31: UserWarning: Failed to import huggingface plugin due to: AttributeError("module 'transformers.modeling_utils' has no attribute 'Conv1D'"). You may ignore this warning if you do not need this plugin.
|
| 9 |
-
warnings.warn(
|
| 10 |
-
/usr/local/lib/python3.12/dist-packages/modelopt/torch/utils/import_utils.py:31: UserWarning: Failed to import megatron plugin due to: AttributeError("module 'transformers.modeling_utils' has no attribute 'Conv1D'"). You may ignore this warning if you do not need this plugin.
|
| 11 |
-
warnings.warn(
|
| 12 |
-
Setting num_layers to 80 from checkpoint
|
| 13 |
-
Setting hidden_size to 5120 from checkpoint
|
| 14 |
-
Setting ffn_hidden_size to 27648 from checkpoint
|
| 15 |
-
Setting seq_length to 131072 from checkpoint
|
| 16 |
-
Setting num_attention_heads to 40 from checkpoint
|
| 17 |
-
Setting num_query_groups to 8 from checkpoint
|
| 18 |
-
Setting group_query_attention to True from checkpoint
|
| 19 |
-
Setting kv_channels to 128 from checkpoint
|
| 20 |
-
Setting max_position_embeddings to 131072 from checkpoint
|
| 21 |
-
Setting position_embedding_type to rope from checkpoint
|
| 22 |
-
Setting add_position_embedding to True from checkpoint
|
| 23 |
-
Setting use_rotary_position_embeddings to True from checkpoint
|
| 24 |
-
Setting rotary_base to 500000 from checkpoint
|
| 25 |
-
Setting rotary_percent to 1.0 from checkpoint
|
| 26 |
-
Setting rotary_interleaved to False from checkpoint
|
| 27 |
-
Setting add_bias_linear to False from checkpoint
|
| 28 |
-
Setting add_qkv_bias to False from checkpoint
|
| 29 |
-
Setting squared_relu to False from checkpoint
|
| 30 |
-
Setting swiglu to True from checkpoint
|
| 31 |
-
Setting untie_embeddings_and_output_weights to True from checkpoint
|
| 32 |
-
Setting apply_layernorm_1p to False from checkpoint
|
| 33 |
-
Setting normalization to RMSNorm from checkpoint
|
| 34 |
-
Setting apply_query_key_layer_scaling to False from checkpoint
|
| 35 |
-
Setting attention_dropout to 0.0 from checkpoint
|
| 36 |
-
Setting hidden_dropout to 0.0 from checkpoint
|
| 37 |
-
Checkpoint did not provide arguments hybrid_override_pattern
|
| 38 |
-
Checkpoint did not provide arguments spec
|
| 39 |
-
Setting hybrid_attention_ratio to 0.0 from checkpoint
|
| 40 |
-
Setting hybrid_mlp_ratio to 0.0 from checkpoint
|
| 41 |
-
Checkpoint did not provide arguments num_experts
|
| 42 |
-
Setting moe_layer_freq to 1 from checkpoint
|
| 43 |
-
Setting moe_router_topk to 2 from checkpoint
|
| 44 |
-
Setting moe_router_pre_softmax to False from checkpoint
|
| 45 |
-
Setting moe_grouped_gemm to False from checkpoint
|
| 46 |
-
Checkpoint did not provide arguments moe_shared_expert_intermediate_size
|
| 47 |
-
Setting mamba_state_dim to 128 from checkpoint
|
| 48 |
-
Setting mamba_head_dim to 64 from checkpoint
|
| 49 |
-
Setting mamba_num_groups to 8 from checkpoint
|
| 50 |
-
Checkpoint did not provide arguments mamba_num_heads
|
| 51 |
-
Setting is_hybrid_model to False from checkpoint
|
| 52 |
-
Checkpoint did not provide arguments heterogeneous_layers_config_path
|
| 53 |
-
Checkpoint did not provide arguments heterogeneous_layers_config_encoded_json
|
| 54 |
-
Setting tokenizer_type to SFTTokenizer from checkpoint
|
| 55 |
-
Setting tokenizer_model to /cpfs01/users/wzhang/iquest-coder-v1.1/RepoData-Ucoder-32B-128k-from2.5.2/97.09B_instruct_iquest-coder from checkpoint
|
| 56 |
-
Checkpoint did not provide arguments tiktoken_pattern
|
| 57 |
-
Setting padded_vocab_size to 76800 from checkpoint
|
| 58 |
-
INFO:megatron.core.num_microbatches_calculator:setting number of microbatches to constant 1
|
| 59 |
-
WARNING: one_logger package is required to enable e2e metrics tracking. please go to https://confluence.nvidia.com/display/MLWFO/Package+Repositories for details to install it
|
| 60 |
-
building GPT model ...
|
| 61 |
-
(TP, PP) mismatch after resume ((1, 1) vs (8, 1) from checkpoint): RNG state will be ignored
|
| 62 |
-
sharded_state_dict metadata loaded from the checkpoint: {'distrib_optim_sharding_type': 'dp_reshardable', 'singleton_local_shards': False, 'chained_optim_avoid_prefix': True}
|
| 63 |
-
Job sharding has changed: Rerun state will be ignored
|
| 64 |
-
loading distributed checkpoint from /tmp/megatron_convert_iter985_node0_pid360_19ab7eb5 at iteration 985
|
| 65 |
-
/volume/pt-train/users/wzhang/wjj-workspace/code-sft/src/training/Megatron-LM/megatron/core/dist_checkpointing/strategies/torch.py:956: FutureWarning: `load_state_dict` is deprecated and will be removed in future versions. Please use `load` instead.
|
| 66 |
-
checkpoint.load_state_dict(
|
| 67 |
-
/usr/local/lib/python3.12/dist-packages/torch/distributed/checkpoint/planner_helpers.py:406: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
|
| 68 |
-
device = getattr(value, "device", None)
|
| 69 |
-
/usr/local/lib/python3.12/dist-packages/torch/distributed/checkpoint/default_planner.py:454: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
|
| 70 |
-
and md.size != obj.size()
|
| 71 |
-
checkpoint version 3.0
|
| 72 |
-
successfully loaded checkpoint from /tmp/megatron_convert_iter985_node0_pid360_19ab7eb5 [ t 1/1, p 1/1 ] at iteration 985
|
| 73 |
-
sending embeddings
|
| 74 |
-
sending transformer layer 0
|
| 75 |
-
sending transformer layer 1
|
| 76 |
-
sending transformer layer 2
|
| 77 |
-
sending transformer layer 3
|
| 78 |
-
sending transformer layer 4
|
| 79 |
-
sending transformer layer 5
|
| 80 |
-
sending transformer layer 6
|
| 81 |
-
sending transformer layer 7
|
| 82 |
-
sending transformer layer 8
|
| 83 |
-
sending transformer layer 9
|
| 84 |
-
sending transformer layer 10
|
| 85 |
-
sending transformer layer 11
|
| 86 |
-
sending transformer layer 12
|
| 87 |
-
sending transformer layer 13
|
| 88 |
-
sending transformer layer 14
|
| 89 |
-
sending transformer layer 15
|
| 90 |
-
sending transformer layer 16
|
| 91 |
-
sending transformer layer 17
|
| 92 |
-
sending transformer layer 18
|
| 93 |
-
sending transformer layer 19
|
| 94 |
-
sending transformer layer 20
|
| 95 |
-
sending transformer layer 21
|
| 96 |
-
sending transformer layer 22
|
| 97 |
-
sending transformer layer 23
|
| 98 |
-
sending transformer layer 24
|
| 99 |
-
sending transformer layer 25
|
| 100 |
-
sending transformer layer 26
|
| 101 |
-
sending transformer layer 27
|
| 102 |
-
fused_indices_to_multihot has reached end of life. Please migrate to a non-experimental function.
|
| 103 |
-
sending transformer layer 28
|
| 104 |
-
sending transformer layer 29
|
| 105 |
-
sending transformer layer 30
|
| 106 |
-
sending transformer layer 31
|
| 107 |
-
sending transformer layer 32
|
| 108 |
-
sending transformer layer 33
|
| 109 |
-
sending transformer layer 34
|
| 110 |
-
sending transformer layer 35
|
| 111 |
-
sending transformer layer 36
|
| 112 |
-
sending transformer layer 37
|
| 113 |
-
received embeddings
|
| 114 |
-
received transformer layer 0
|
| 115 |
-
sending transformer layer 38
|
| 116 |
-
received transformer layer 1
|
| 117 |
-
received transformer layer 2
|
| 118 |
-
received transformer layer 3
|
| 119 |
-
received transformer layer 4
|
| 120 |
-
sending transformer layer 39
|
| 121 |
-
received transformer layer 5
|
| 122 |
-
received transformer layer 6
|
| 123 |
-
received transformer layer 7
|
| 124 |
-
received transformer layer 8
|
| 125 |
-
received transformer layer 9
|
| 126 |
-
received transformer layer 10
|
| 127 |
-
sending transformer layer 40
|
| 128 |
-
received transformer layer 11
|
| 129 |
-
received transformer layer 12
|
| 130 |
-
received transformer layer 13
|
| 131 |
-
sending transformer layer 41
|
| 132 |
-
received transformer layer 14
|
| 133 |
-
received transformer layer 15
|
| 134 |
-
received transformer layer 16
|
| 135 |
-
received transformer layer 17
|
| 136 |
-
sending transformer layer 42
|
| 137 |
-
received transformer layer 18
|
| 138 |
-
sending transformer layer 43
|
| 139 |
-
sending transformer layer 44
|
| 140 |
-
sending transformer layer 45
|
| 141 |
-
received transformer layer 19
|
| 142 |
-
sending transformer layer 46
|
| 143 |
-
sending transformer layer 47
|
| 144 |
-
received transformer layer 20
|
| 145 |
-
sending transformer layer 48
|
| 146 |
-
sending transformer layer 49
|
| 147 |
-
received transformer layer 21
|
| 148 |
-
sending transformer layer 50
|
| 149 |
-
sending transformer layer 51
|
| 150 |
-
sending transformer layer 52
|
| 151 |
-
received transformer layer 22
|
| 152 |
-
sending transformer layer 53
|
| 153 |
-
sending transformer layer 54
|
| 154 |
-
received transformer layer 23
|
| 155 |
-
sending transformer layer 55
|
| 156 |
-
sending transformer layer 56
|
| 157 |
-
sending transformer layer 57
|
| 158 |
-
received transformer layer 24
|
| 159 |
-
sending transformer layer 58
|
| 160 |
-
sending transformer layer 59
|
| 161 |
-
received transformer layer 25
|
| 162 |
-
sending transformer layer 60
|
| 163 |
-
sending transformer layer 61
|
| 164 |
-
sending transformer layer 62
|
| 165 |
-
received transformer layer 26
|
| 166 |
-
sending transformer layer 63
|
| 167 |
-
sending transformer layer 64
|
| 168 |
-
sending transformer layer 65
|
| 169 |
-
received transformer layer 27
|
| 170 |
-
sending transformer layer 66
|
| 171 |
-
sending transformer layer 67
|
| 172 |
-
sending transformer layer 68
|
| 173 |
-
sending transformer layer 69
|
| 174 |
-
received transformer layer 28
|
| 175 |
-
sending transformer layer 70
|
| 176 |
-
sending transformer layer 71
|
| 177 |
-
received transformer layer 29
|
| 178 |
-
sending transformer layer 72
|
| 179 |
-
sending transformer layer 73
|
| 180 |
-
sending transformer layer 74
|
| 181 |
-
received transformer layer 30
|
| 182 |
-
sending transformer layer 75
|
| 183 |
-
sending transformer layer 76
|
| 184 |
-
sending transformer layer 77
|
| 185 |
-
received transformer layer 31
|
| 186 |
-
sending transformer layer 78
|
| 187 |
-
sending transformer layer 79
|
| 188 |
-
sending final norm
|
| 189 |
-
sending output layer
|
| 190 |
-
Waiting for saver to complete...
|
| 191 |
-
received transformer layer 32
|
| 192 |
-
received transformer layer 33
|
| 193 |
-
received transformer layer 34
|
| 194 |
-
received transformer layer 35
|
| 195 |
-
received transformer layer 36
|
| 196 |
-
received transformer layer 37
|
| 197 |
-
received transformer layer 38
|
| 198 |
-
received transformer layer 39
|
| 199 |
-
received transformer layer 40
|
| 200 |
-
received transformer layer 41
|
| 201 |
-
received transformer layer 42
|
| 202 |
-
received transformer layer 43
|
| 203 |
-
received transformer layer 44
|
| 204 |
-
received transformer layer 45
|
| 205 |
-
received transformer layer 46
|
| 206 |
-
received transformer layer 47
|
| 207 |
-
received transformer layer 48
|
| 208 |
-
received transformer layer 49
|
| 209 |
-
received transformer layer 50
|
| 210 |
-
received transformer layer 51
|
| 211 |
-
received transformer layer 52
|
| 212 |
-
received transformer layer 53
|
| 213 |
-
received transformer layer 54
|
| 214 |
-
received transformer layer 55
|
| 215 |
-
received transformer layer 56
|
| 216 |
-
received transformer layer 57
|
| 217 |
-
received transformer layer 58
|
| 218 |
-
received transformer layer 59
|
| 219 |
-
received transformer layer 60
|
| 220 |
-
received transformer layer 61
|
| 221 |
-
received transformer layer 62
|
| 222 |
-
received transformer layer 63
|
| 223 |
-
received transformer layer 64
|
| 224 |
-
received transformer layer 65
|
| 225 |
-
received transformer layer 66
|
| 226 |
-
received transformer layer 67
|
| 227 |
-
received transformer layer 68
|
| 228 |
-
received transformer layer 69
|
| 229 |
-
received transformer layer 70
|
| 230 |
-
received transformer layer 71
|
| 231 |
-
received transformer layer 72
|
| 232 |
-
received transformer layer 73
|
| 233 |
-
received transformer layer 74
|
| 234 |
-
received transformer layer 75
|
| 235 |
-
received transformer layer 76
|
| 236 |
-
received transformer layer 77
|
| 237 |
-
received transformer layer 78
|
| 238 |
-
received transformer layer 79
|
| 239 |
-
received final norm
|
| 240 |
-
received output layer
|
| 241 |
-
Saving model to disk ...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|