...
· BERT-LARGE (L=24, H=1024, A=16, Total Parameters=340M).
BERT-BASE contains 110M parameters and BERT-LARGE contains 340M parameters.
For the purposes of this challenge, we will be using BERT-BASE.
1.1.2 About SQuAD 1.1
The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state-of-the-art results on SQuAD with almost no task-specific network architecture modifications or data augmentation. However, it does require semi-complex data pre-processing and post-processing to deal with (a) the variable-length nature of SQuAD context paragraphs, and (b) the character-level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py.
1.2 Running SQuAD 1.1 fine tuning and inference
1.2.1 Using Docker and NVIDIA Docker Image
Code Block |
---|
docker pull nvcr.io/nvidia/tensorflow:20.02-tf1-py3 docker images REPOSITORY TAG IMAGE ID CREATED SIZE nvcr.io/nvidia/tensorflow 20.02-tf1-py3 0c7b70421b78 7 weeks ago 9.49GB |
...
NVIDIA BERT codes is a publicly available implementation of BERT. It supports Multi-GPU training with Horovod - NVIDIA BERT fine-tune code uses Horovod to implement efficient multi-GPU training with NCCL.
Code Block |
---|
[~]# git clone https://github.com/NVIDIA/DeepLearningExamples.git |
You may use other implementations, optimize and tune; but you must use the BERT-Base uncased pre-trained model for the purposes of this challenge.
...
/workspace/nvidia-examples/bert/data/download/google_pretrained_weights
Code Block |
---|
root@tessa002:/workspace/nvidia-examples/bert/data# mkdir download
root@tessa002:/workspace/nvidia-examples/bert/data# cd download
root@tessa002:/workspace/nvidia-examples/bert/data/download# mkdir google_pretrained_weights
root@tessa002:/workspace/nvidia-examples/bert/data/download# cd google_pretrained_weights/
root@tessa002:/workspace/nvidia-examples/bert/data/download/google_pretrained_weights# wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
root@tessa002:/workspace/nvidia-examples/bert/data/download/google_pretrained_weights# unzip uncased_L-12_H-768_A-12.zip
Archive: uncased_L-12_H-768_A-12.zip
creating: uncased_L-12_H-768_A-12/
inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.meta
inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.data-00000-of-00001
inflating: uncased_L-12_H-768_A-12/vocab.txt
inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.index
inflating: uncased_L-12_H-768_A-12/bert_config.json
|
1.2.4 Download the SQuAD 1.1 dataset
...
We will download these to: /workspace/nvidia-examples/bert/data/download/squad/v1.1
Code Block |
---|
root@tessa002:/workspace/nvidia-examples/bert/data/download# mkdir squad
root@tessa002:/workspace/nvidia-examples/bert/data/download# cd squad
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad# mkdir v1.1
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad# cd v1.1/
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1# wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1# wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1# wget https://github.com/allenai/bi-att-flow/archive/master.zip
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1# unzip master.zip
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1# cd bi-att-flow-master/
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1/bi-att-flow-master# cd squad
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1/bi-att-flow-master/squad# cp evaluate-v1.1.py /workspace/nvidia-examples/bert/data/download/squad/v1.1/
root@tessa002:cd /workspace/nvidia-examples/bert
|
1.2.5 Start fine tuning
...
For SQuAD 1.1 FP16 training with XLA using (4) T4 16GB GPU's run:
Code Block |
---|
bash scripts/run_squad.sh 10 5e-6 fp16 true 4 384 128 base 1.1 data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/bert_model.ckpt 1.1 |
1.2.6 Verify results
Code Block |
---|
INFO:tensorflow:----------------------------- I0326 01:25:43.144953 140630939256640 run_squad.py:1127] ----------------------------- INFO:tensorflow:Total Inference Time = 88.62 for Sentences = 10840 I0326 01:25:43.145423 140630939256640 run_squad.py:1129] Total Inference Time = 88.62 for Sentences = 10840 INFO:tensorflow:Total Inference Time W/O Overhead = 75.86 for Sentences = 10824 I0326 01:25:43.145554 140630939256640 run_squad.py:1131] Total Inference Time W/O Overhead = 75.86 for Sentences = 10824 INFO:tensorflow:Summary Inference Statistics I0326 01:25:43.145649 140630939256640 run_squad.py:1132] Summary Inference Statistics INFO:tensorflow:Batch size = 8 I0326 01:25:43.145738 140630939256640 run_squad.py:1133] Batch size = 8 INFO:tensorflow:Sequence Length = 384 I0326 01:25:43.145867 140630939256640 run_squad.py:1134] Sequence Length = 384 INFO:tensorflow:Precision = fp16 I0326 01:25:43.145962 140630939256640 run_squad.py:1135] Precision = fp16 INFO:tensorflow:Latency Confidence Level 50 (ms) = 55.79 I0326 01:25:43.146052 140630939256640 run_squad.py:1136] Latency Confidence Level 50 (ms) = 55.79 INFO:tensorflow:Latency Confidence Level 90 (ms) = 57.03 I0326 01:25:43.146145 140630939256640 run_squad.py:1137] Latency Confidence Level 90 (ms) = 57.03 INFO:tensorflow:Latency Confidence Level 95 (ms) = 57.29 I0326 01:25:43.146225 140630939256640 run_squad.py:1138] Latency Confidence Level 95 (ms) = 57.29 INFO:tensorflow:Latency Confidence Level 99 (ms) = 58.62 I0326 01:25:43.146308 140630939256640 run_squad.py:1139] Latency Confidence Level 99 (ms) = 58.62 INFO:tensorflow:Latency Confidence Level 100 (ms) = 286.80 I0326 01:25:43.146387 140630939256640 run_squad.py:1140] Latency Confidence Level 100 (ms) = 286.80 INFO:tensorflow:Latency Average (ms) = 56.07 I0326 01:25:43.146471 140630939256640 run_squad.py:1141] Latency Average (ms) = 56.07 INFO:tensorflow:Throughput Average (sentences/sec) = 142.68 I0326 01:25:43.146564 140630939256640 run_squad.py:1142] Throughput Average (sentences/sec) = 142.68 INFO:tensorflow:----------------------------- I0326 01:25:43.146645 140630939256640 run_squad.py:1143] ----------------------------- INFO:tensorflow:Writing predictions to: /results/tf_bert_finetuning_squad_base_fp16_gbs40_200326010711/predictions.json I0326 01:25:43.146801 140630939256640 run_squad.py:431] Writing predictions to: /results/tf_bert_finetuning_squad_base_fp16_gbs40_200326010711/predictions.json INFO:tensorflow:Writing nbest to: /results/tf_bert_finetuning_squad_base_fp16_gbs40_200326010711/nbest_predictions.json I0326 01:25:43.146886 140630939256640 run_squad.py:432] Writing nbest to: /results/tf_bert_finetuning_squad_base_fp16_gbs40_200326010711/nbest_predictions.json {"exact_match": 78.0321665089877, "f1": 86.34229152935384} |
...
(https://github.com/lambdal/bert )
Code Block |
---|
root@tessa002:/workspace# mkdir lambdal
root@tessa002:/workspace# cd lambdal
root@tessa002:/workspace/lambdal# git clone https://github.com/lambdal/bert
root@tessa002:/workspace/lambdal/bert# mpirun -np 4 -H localhost:4 -bind-to none -map-by slot -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -mca pml ob1 -mca btl ^openib --allow-run-as-root python3 run_squad_hvd.py --vocab_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/vocab.txt --bert_config_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/bert_config.json --init_checkpoint=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/bert_model.ckpt --do_train=True --train_file=/workspace/nvidia-examples/bert/data/download/squad/v1.1/train-v1.1.json --do_predict=True --predict_file=/workspace/nvidia-examples/bert/data/download/squad/v1.1/dev-v1.1.json --train_batch_size=12 --learning_rate=3e-5 --num_train_epochs=2.0 --max_seq_length=384 --doc_stride=128 --output_dir=/results/lambdal/squad1/squad_base/ --horovod=true
|
look for similar output
Code Block |
---|
I0326 05:55:19.917063 140421161031488 run_squad_hvd.py:747] Writing predictions to: /results/lambdal/squad1/squad_base/predictions.json
INFO:tensorflow:Writing nbest to: /results/lambdal/squad1/squad_base/nbest_predictions.json
I0326 05:55:19.917179 140421161031488 run_squad_hvd.py:748] Writing nbest to: /results/lambdal/squad1/squad_base/nbest_predictions.json
|
To check score:
Code Block |
---|
root@tessa002:/workspace/lambdal/bert# python /workspace/nvidia-examples/bert/data/download/squad/v1.1/evaluate-v1.1.py /workspace/nvidia-examples/bert/data/download/squad/v1.1/dev-v1.1.json /results/lambdal/squad1/squad_base/predictions.json
{"exact_match": 78.1929990539262, "f1": 86.51319484763773}
|
Note : part of your final score includes these results:
...
Note : (This is the method that judges will use to score unseen data)
Code Block |
---|
root@tessa002:/workspace/nvidia-examples/bert# cd /workspace
root@tessa002:/workspace# git clone https://github.com/google-research/bert.git
root@tessa002:/workspace# cd bert |
...
1.2.9 Create a sample input file in json format (note the "id" to reference later).
...
Code Block |
---|
root@tessa002:/workspace/bert# python3 run_squad.py --vocab_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/vocab.txt --bert_config_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/bert_config.json --init_checkpoint=/results/tf_bert_finetuning_squad_base_fp16_gbs40_200326010711/model.ckpt-2408 --do_train=False --max_query_length=30 --do_predict=True --predict_file=test_input.json --predict_batch_size=16 --max_seq_length=384 --doc_stride=128 --output_dir=/results/squad1/squad_test/
|
Note: If you are using alternative method from Lamdal Labs, you will need to use that checkpoint :
...
1.2.11 You should see similar output below :
Code Block |
---|
I0326 02:11:40.096473 140685488179008 run_squad.py:1259] Processing example: 0
INFO:tensorflow:prediction_loop marked as finished
I0326 02:11:40.165820 140685488179008 error_handling.py:101] prediction_loop marked as finished
INFO:tensorflow:prediction_loop marked as finished
I0326 02:11:40.166095 140685488179008 error_handling.py:101] prediction_loop marked as finished
INFO:tensorflow:Writing predictions to: /results/squad1/squad_test/predictions.json
I0326 02:11:40.166555 140685488179008 run_squad.py:745] Writing predictions to: /results/squad1/squad_test/predictions.json
INFO:tensorflow:Writing nbest to: /results/squad1/squad_test/nbest_predictions.json
I0326 02:11:40.166669 140685488179008 run_squad.py:746] Writing nbest to: /results/squad1/squad_test/nbest_predictions.json
|
1.2.12 Check 12 Check correctness in file : predictions.json
Code Block |
---|
{ "56ddde6b9a695914005b9628": "Sundar Pichai", "56ddde6b9a695914005b9629": "Larry Page and Sergey Brin", "56ddde6b9a695914005b9630": "September 4, 1998", "56ddde6b9a695914005b9631": "CEO", "56ddde6b9a695914005b9632": "Alphabet Inc" } |
1.2.13 Check 13 Check accuracy in file: nbest_predictions.json
...
Scores will be derived from the nbest_predictions.json output for each question on the context.
1.3 Competition 3 Competition Limits:
Must stick to pre-defined model (BERT-Base, Uncased)
...
You must prove all scripts and methodology used to achieve results
1.4 Teams 4 Teams must produce:
Training scripts with their full training routine and command lines and output
...
run_squad.py predictions.json and nbest_predictions.json
1.5 5 Method of final scoring procedure :
...
Final scores from unseen data of multiple questions; prediction from file, using standard run_squad.py
1.6 Important6 Important: Final evaluation scripts and ckpt directories and files must be submitted for approval, 90 minutes before the end of competition.
...