zwpride-iquestlab commited on
Commit
504dc0f
·
verified ·
1 Parent(s): 4faf90b

Delete conversion_to_hf.log

Browse files
Files changed (1) hide show
  1. conversion_to_hf.log +0 -241
conversion_to_hf.log DELETED
@@ -1,241 +0,0 @@
1
- Loaded loader_megatron_core as the loader.
2
- Loaded saver_llama2_hf_bf as the saver.
3
- Starting saver...
4
- Starting loader...
5
- fused_indices_to_multihot has reached end of life. Please migrate to a non-experimental function.
6
- /usr/local/lib/python3.12/dist-packages/modelopt/torch/utils/import_utils.py:31: UserWarning: Failed to import apex plugin due to: AttributeError("module 'transformers.modeling_utils' has no attribute 'Conv1D'"). You may ignore this warning if you do not need this plugin.
7
- warnings.warn(
8
- /usr/local/lib/python3.12/dist-packages/modelopt/torch/utils/import_utils.py:31: UserWarning: Failed to import huggingface plugin due to: AttributeError("module 'transformers.modeling_utils' has no attribute 'Conv1D'"). You may ignore this warning if you do not need this plugin.
9
- warnings.warn(
10
- /usr/local/lib/python3.12/dist-packages/modelopt/torch/utils/import_utils.py:31: UserWarning: Failed to import megatron plugin due to: AttributeError("module 'transformers.modeling_utils' has no attribute 'Conv1D'"). You may ignore this warning if you do not need this plugin.
11
- warnings.warn(
12
- Setting num_layers to 80 from checkpoint
13
- Setting hidden_size to 5120 from checkpoint
14
- Setting ffn_hidden_size to 27648 from checkpoint
15
- Setting seq_length to 131072 from checkpoint
16
- Setting num_attention_heads to 40 from checkpoint
17
- Setting num_query_groups to 8 from checkpoint
18
- Setting group_query_attention to True from checkpoint
19
- Setting kv_channels to 128 from checkpoint
20
- Setting max_position_embeddings to 131072 from checkpoint
21
- Setting position_embedding_type to rope from checkpoint
22
- Setting add_position_embedding to True from checkpoint
23
- Setting use_rotary_position_embeddings to True from checkpoint
24
- Setting rotary_base to 500000 from checkpoint
25
- Setting rotary_percent to 1.0 from checkpoint
26
- Setting rotary_interleaved to False from checkpoint
27
- Setting add_bias_linear to False from checkpoint
28
- Setting add_qkv_bias to False from checkpoint
29
- Setting squared_relu to False from checkpoint
30
- Setting swiglu to True from checkpoint
31
- Setting untie_embeddings_and_output_weights to True from checkpoint
32
- Setting apply_layernorm_1p to False from checkpoint
33
- Setting normalization to RMSNorm from checkpoint
34
- Setting apply_query_key_layer_scaling to False from checkpoint
35
- Setting attention_dropout to 0.0 from checkpoint
36
- Setting hidden_dropout to 0.0 from checkpoint
37
- Checkpoint did not provide arguments hybrid_override_pattern
38
- Checkpoint did not provide arguments spec
39
- Setting hybrid_attention_ratio to 0.0 from checkpoint
40
- Setting hybrid_mlp_ratio to 0.0 from checkpoint
41
- Checkpoint did not provide arguments num_experts
42
- Setting moe_layer_freq to 1 from checkpoint
43
- Setting moe_router_topk to 2 from checkpoint
44
- Setting moe_router_pre_softmax to False from checkpoint
45
- Setting moe_grouped_gemm to False from checkpoint
46
- Checkpoint did not provide arguments moe_shared_expert_intermediate_size
47
- Setting mamba_state_dim to 128 from checkpoint
48
- Setting mamba_head_dim to 64 from checkpoint
49
- Setting mamba_num_groups to 8 from checkpoint
50
- Checkpoint did not provide arguments mamba_num_heads
51
- Setting is_hybrid_model to False from checkpoint
52
- Checkpoint did not provide arguments heterogeneous_layers_config_path
53
- Checkpoint did not provide arguments heterogeneous_layers_config_encoded_json
54
- Setting tokenizer_type to SFTTokenizer from checkpoint
55
- Setting tokenizer_model to /cpfs01/users/wzhang/iquest-coder-v1.1/RepoData-Ucoder-32B-128k-from2.5.2/97.09B_instruct_iquest-coder from checkpoint
56
- Checkpoint did not provide arguments tiktoken_pattern
57
- Setting padded_vocab_size to 76800 from checkpoint
58
- INFO:megatron.core.num_microbatches_calculator:setting number of microbatches to constant 1
59
- WARNING: one_logger package is required to enable e2e metrics tracking. please go to https://confluence.nvidia.com/display/MLWFO/Package+Repositories for details to install it
60
- building GPT model ...
61
- (TP, PP) mismatch after resume ((1, 1) vs (8, 1) from checkpoint): RNG state will be ignored
62
- sharded_state_dict metadata loaded from the checkpoint: {'distrib_optim_sharding_type': 'dp_reshardable', 'singleton_local_shards': False, 'chained_optim_avoid_prefix': True}
63
- Job sharding has changed: Rerun state will be ignored
64
- loading distributed checkpoint from /tmp/megatron_convert_iter985_node0_pid360_19ab7eb5 at iteration 985
65
- /volume/pt-train/users/wzhang/wjj-workspace/code-sft/src/training/Megatron-LM/megatron/core/dist_checkpointing/strategies/torch.py:956: FutureWarning: `load_state_dict` is deprecated and will be removed in future versions. Please use `load` instead.
66
- checkpoint.load_state_dict(
67
- /usr/local/lib/python3.12/dist-packages/torch/distributed/checkpoint/planner_helpers.py:406: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
68
- device = getattr(value, "device", None)
69
- /usr/local/lib/python3.12/dist-packages/torch/distributed/checkpoint/default_planner.py:454: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
70
- and md.size != obj.size()
71
- checkpoint version 3.0
72
- successfully loaded checkpoint from /tmp/megatron_convert_iter985_node0_pid360_19ab7eb5 [ t 1/1, p 1/1 ] at iteration 985
73
- sending embeddings
74
- sending transformer layer 0
75
- sending transformer layer 1
76
- sending transformer layer 2
77
- sending transformer layer 3
78
- sending transformer layer 4
79
- sending transformer layer 5
80
- sending transformer layer 6
81
- sending transformer layer 7
82
- sending transformer layer 8
83
- sending transformer layer 9
84
- sending transformer layer 10
85
- sending transformer layer 11
86
- sending transformer layer 12
87
- sending transformer layer 13
88
- sending transformer layer 14
89
- sending transformer layer 15
90
- sending transformer layer 16
91
- sending transformer layer 17
92
- sending transformer layer 18
93
- sending transformer layer 19
94
- sending transformer layer 20
95
- sending transformer layer 21
96
- sending transformer layer 22
97
- sending transformer layer 23
98
- sending transformer layer 24
99
- sending transformer layer 25
100
- sending transformer layer 26
101
- sending transformer layer 27
102
- fused_indices_to_multihot has reached end of life. Please migrate to a non-experimental function.
103
- sending transformer layer 28
104
- sending transformer layer 29
105
- sending transformer layer 30
106
- sending transformer layer 31
107
- sending transformer layer 32
108
- sending transformer layer 33
109
- sending transformer layer 34
110
- sending transformer layer 35
111
- sending transformer layer 36
112
- sending transformer layer 37
113
- received embeddings
114
- received transformer layer 0
115
- sending transformer layer 38
116
- received transformer layer 1
117
- received transformer layer 2
118
- received transformer layer 3
119
- received transformer layer 4
120
- sending transformer layer 39
121
- received transformer layer 5
122
- received transformer layer 6
123
- received transformer layer 7
124
- received transformer layer 8
125
- received transformer layer 9
126
- received transformer layer 10
127
- sending transformer layer 40
128
- received transformer layer 11
129
- received transformer layer 12
130
- received transformer layer 13
131
- sending transformer layer 41
132
- received transformer layer 14
133
- received transformer layer 15
134
- received transformer layer 16
135
- received transformer layer 17
136
- sending transformer layer 42
137
- received transformer layer 18
138
- sending transformer layer 43
139
- sending transformer layer 44
140
- sending transformer layer 45
141
- received transformer layer 19
142
- sending transformer layer 46
143
- sending transformer layer 47
144
- received transformer layer 20
145
- sending transformer layer 48
146
- sending transformer layer 49
147
- received transformer layer 21
148
- sending transformer layer 50
149
- sending transformer layer 51
150
- sending transformer layer 52
151
- received transformer layer 22
152
- sending transformer layer 53
153
- sending transformer layer 54
154
- received transformer layer 23
155
- sending transformer layer 55
156
- sending transformer layer 56
157
- sending transformer layer 57
158
- received transformer layer 24
159
- sending transformer layer 58
160
- sending transformer layer 59
161
- received transformer layer 25
162
- sending transformer layer 60
163
- sending transformer layer 61
164
- sending transformer layer 62
165
- received transformer layer 26
166
- sending transformer layer 63
167
- sending transformer layer 64
168
- sending transformer layer 65
169
- received transformer layer 27
170
- sending transformer layer 66
171
- sending transformer layer 67
172
- sending transformer layer 68
173
- sending transformer layer 69
174
- received transformer layer 28
175
- sending transformer layer 70
176
- sending transformer layer 71
177
- received transformer layer 29
178
- sending transformer layer 72
179
- sending transformer layer 73
180
- sending transformer layer 74
181
- received transformer layer 30
182
- sending transformer layer 75
183
- sending transformer layer 76
184
- sending transformer layer 77
185
- received transformer layer 31
186
- sending transformer layer 78
187
- sending transformer layer 79
188
- sending final norm
189
- sending output layer
190
- Waiting for saver to complete...
191
- received transformer layer 32
192
- received transformer layer 33
193
- received transformer layer 34
194
- received transformer layer 35
195
- received transformer layer 36
196
- received transformer layer 37
197
- received transformer layer 38
198
- received transformer layer 39
199
- received transformer layer 40
200
- received transformer layer 41
201
- received transformer layer 42
202
- received transformer layer 43
203
- received transformer layer 44
204
- received transformer layer 45
205
- received transformer layer 46
206
- received transformer layer 47
207
- received transformer layer 48
208
- received transformer layer 49
209
- received transformer layer 50
210
- received transformer layer 51
211
- received transformer layer 52
212
- received transformer layer 53
213
- received transformer layer 54
214
- received transformer layer 55
215
- received transformer layer 56
216
- received transformer layer 57
217
- received transformer layer 58
218
- received transformer layer 59
219
- received transformer layer 60
220
- received transformer layer 61
221
- received transformer layer 62
222
- received transformer layer 63
223
- received transformer layer 64
224
- received transformer layer 65
225
- received transformer layer 66
226
- received transformer layer 67
227
- received transformer layer 68
228
- received transformer layer 69
229
- received transformer layer 70
230
- received transformer layer 71
231
- received transformer layer 72
232
- received transformer layer 73
233
- received transformer layer 74
234
- received transformer layer 75
235
- received transformer layer 76
236
- received transformer layer 77
237
- received transformer layer 78
238
- received transformer layer 79
239
- received final norm
240
- received output layer
241
- Saving model to disk ...