Content Comparison

...

· BERT-LARGE (L=24, H=1024, A=16, Total Parameters=340M).

BERT-BASE contains 110M parameters and BERT-LARGE contains 340M parameters.

For the purposes of this challenge, we will be using BERT-BASE.

1.1.2 About SQuAD 1.1

The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state-of-the-art results on SQuAD with almost no task-specific network architecture modifications or data augmentation. However, it does require semi-complex data pre-processing and post-processing to deal with (a) the variable-length nature of SQuAD context paragraphs, and (b) the character-level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py.

1.2 Running SQuAD 1.1 fine tuning and inference

1.2.1 Using Docker and NVIDIA Docker Image

Code Block
docker pull nvcr.io/nvidia/tensorflow:20.02-tf1-py3 docker images REPOSITORY TAG IMAGE ID CREATED SIZE nvcr.io/nvidia/tensorflow 20.02-tf1-py3 0c7b70421b78 7 weeks ago 9.49GB

...

NVIDIA BERT codes is a publicly available implementation of BERT. It supports Multi-GPU training with Horovod - NVIDIA BERT fine-tune code uses Horovod to implement efficient multi-GPU training with NCCL.

Code Block
[~]# git clone https://github.com/NVIDIA/DeepLearningExamples.git

You may use other implementations, optimize and tune; but you must use the BERT-Base uncased pre-trained model for the purposes of this challenge.

...

/workspace/nvidia-examples/bert/data/download/google_pretrained_weights

Code Block

root@tessa002:/workspace/nvidia-examples/bert/data# mkdir download
root@tessa002:/workspace/nvidia-examples/bert/data# cd download

root@tessa002:/workspace/nvidia-examples/bert/data/download# mkdir google_pretrained_weights

root@tessa002:/workspace/nvidia-examples/bert/data/download# cd google_pretrained_weights/
root@tessa002:/workspace/nvidia-examples/bert/data/download/google_pretrained_weights# wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip

root@tessa002:/workspace/nvidia-examples/bert/data/download/google_pretrained_weights# unzip uncased_L-12_H-768_A-12.zip
Archive:  uncased_L-12_H-768_A-12.zip
   creating: uncased_L-12_H-768_A-12/
  inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.meta
  inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.data-00000-of-00001
  inflating: uncased_L-12_H-768_A-12/vocab.txt
  inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.index
  inflating: uncased_L-12_H-768_A-12/bert_config.json

1.2.4 Download the SQuAD 1.1 dataset

...

We will download these to: /workspace/nvidia-examples/bert/data/download/squad/v1.1

Code Block

root@tessa002:/workspace/nvidia-examples/bert/data/download# mkdir squad
root@tessa002:/workspace/nvidia-examples/bert/data/download# cd squad
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad# mkdir v1.1

root@tessa002:/workspace/nvidia-examples/bert/data/download/squad# cd v1.1/

root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1# wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json

root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1# wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json

root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1#  wget https://github.com/allenai/bi-att-flow/archive/master.zip

root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1# unzip master.zip

root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1# cd bi-att-flow-master/

root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1/bi-att-flow-master# cd squad

root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1/bi-att-flow-master/squad# cp evaluate-v1.1.py /workspace/nvidia-examples/bert/data/download/squad/v1.1/

root@tessa002:cd /workspace/nvidia-examples/bert

1.2.5 Start fine tuning

...

For SQuAD 1.1 FP16 training with XLA using (4) T4 16GB GPU's run:

Code Block
bash scripts/run_squad.sh 10 5e-6 fp16 true 4 384 128 base 1.1 data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/bert_model.ckpt 1.1

1.2.6 Verify results

Code Block

INFO:tensorflow:-----------------------------
I0326 01:25:43.144953 140630939256640 run_squad.py:1127] -----------------------------
INFO:tensorflow:Total Inference Time = 88.62 for Sentences = 10840
I0326 01:25:43.145423 140630939256640 run_squad.py:1129] Total Inference Time = 88.62 for Sentences = 10840
INFO:tensorflow:Total Inference Time W/O Overhead = 75.86 for Sentences = 10824
I0326 01:25:43.145554 140630939256640 run_squad.py:1131] Total Inference Time W/O Overhead = 75.86 for Sentences = 10824
INFO:tensorflow:Summary Inference Statistics
I0326 01:25:43.145649 140630939256640 run_squad.py:1132] Summary Inference Statistics
INFO:tensorflow:Batch size = 8
I0326 01:25:43.145738 140630939256640 run_squad.py:1133] Batch size = 8
INFO:tensorflow:Sequence Length = 384
I0326 01:25:43.145867 140630939256640 run_squad.py:1134] Sequence Length = 384
INFO:tensorflow:Precision = fp16
I0326 01:25:43.145962 140630939256640 run_squad.py:1135] Precision = fp16
INFO:tensorflow:Latency Confidence Level 50 (ms) = 55.79
I0326 01:25:43.146052 140630939256640 run_squad.py:1136] Latency Confidence Level 50 (ms) = 55.79
INFO:tensorflow:Latency Confidence Level 90 (ms) = 57.03
I0326 01:25:43.146145 140630939256640 run_squad.py:1137] Latency Confidence Level 90 (ms) = 57.03
INFO:tensorflow:Latency Confidence Level 95 (ms) = 57.29
I0326 01:25:43.146225 140630939256640 run_squad.py:1138] Latency Confidence Level 95 (ms) = 57.29
INFO:tensorflow:Latency Confidence Level 99 (ms) = 58.62
I0326 01:25:43.146308 140630939256640 run_squad.py:1139] Latency Confidence Level 99 (ms) = 58.62
INFO:tensorflow:Latency Confidence Level 100 (ms) = 286.80
I0326 01:25:43.146387 140630939256640 run_squad.py:1140] Latency Confidence Level 100 (ms) = 286.80
INFO:tensorflow:Latency Average (ms) = 56.07
I0326 01:25:43.146471 140630939256640 run_squad.py:1141] Latency Average (ms) = 56.07
INFO:tensorflow:Throughput Average (sentences/sec) = 142.68
I0326 01:25:43.146564 140630939256640 run_squad.py:1142] Throughput Average (sentences/sec) = 142.68
INFO:tensorflow:-----------------------------
I0326 01:25:43.146645 140630939256640 run_squad.py:1143] -----------------------------
INFO:tensorflow:Writing predictions to: /results/tf_bert_finetuning_squad_base_fp16_gbs40_200326010711/predictions.json
I0326 01:25:43.146801 140630939256640 run_squad.py:431] Writing predictions to: /results/tf_bert_finetuning_squad_base_fp16_gbs40_200326010711/predictions.json
INFO:tensorflow:Writing nbest to: /results/tf_bert_finetuning_squad_base_fp16_gbs40_200326010711/nbest_predictions.json
I0326 01:25:43.146886 140630939256640 run_squad.py:432] Writing nbest to: /results/tf_bert_finetuning_squad_base_fp16_gbs40_200326010711/nbest_predictions.json
{"exact_match": 78.0321665089877, "f1": 86.34229152935384}

...

(https://github.com/lambdal/bert )

Code Block

root@tessa002:/workspace# mkdir lambdal
root@tessa002:/workspace# cd lambdal
root@tessa002:/workspace/lambdal# git clone https://github.com/lambdal/bert


root@tessa002:/workspace/lambdal/bert# mpirun -np 4 -H localhost:4 -bind-to none -map-by slot -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -mca pml ob1 -mca btl ^openib --allow-run-as-root python3 run_squad_hvd.py --vocab_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/vocab.txt   --bert_config_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/bert_config.json   --init_checkpoint=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/bert_model.ckpt  --do_train=True   --train_file=/workspace/nvidia-examples/bert/data/download/squad/v1.1/train-v1.1.json   --do_predict=True   --predict_file=/workspace/nvidia-examples/bert/data/download/squad/v1.1/dev-v1.1.json --train_batch_size=12 --learning_rate=3e-5   --num_train_epochs=2.0   --max_seq_length=384   --doc_stride=128   --output_dir=/results/lambdal/squad1/squad_base/ --horovod=true

look for similar output

Code Block

I0326 05:55:19.917063 140421161031488 run_squad_hvd.py:747] Writing predictions to: /results/lambdal/squad1/squad_base/predictions.json
INFO:tensorflow:Writing nbest to: /results/lambdal/squad1/squad_base/nbest_predictions.json
I0326 05:55:19.917179 140421161031488 run_squad_hvd.py:748] Writing nbest to: /results/lambdal/squad1/squad_base/nbest_predictions.json

To check score:

Code Block

root@tessa002:/workspace/lambdal/bert# python /workspace/nvidia-examples/bert/data/download/squad/v1.1/evaluate-v1.1.py /workspace/nvidia-examples/bert/data/download/squad/v1.1/dev-v1.1.json /results/lambdal/squad1/squad_base/predictions.json
{"exact_match": 78.1929990539262, "f1": 86.51319484763773}

Note : part of your final score includes these results:

...

Note : (This is the method that judges will use to score unseen data)

Code Block
root@tessa002:/workspace/nvidia-examples/bert# cd /workspace root@tessa002:/workspace# git clone https://github.com/google-research/bert.git root@tessa002:/workspace# cd bert

...

1.2.9 Create a sample input file in json format (note the "id" to reference later).

...

Code Block

root@tessa002:/workspace/bert# python3 run_squad.py --vocab_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/vocab.txt --bert_config_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/bert_config.json   --init_checkpoint=/results/tf_bert_finetuning_squad_base_fp16_gbs40_200326010711/model.ckpt-2408  --do_train=False --max_query_length=30 --do_predict=True   --predict_file=test_input.json --predict_batch_size=16 --max_seq_length=384 --doc_stride=128 --output_dir=/results/squad1/squad_test/

Note: If you are using alternative method from Lamdal Labs, you will need to use that checkpoint :

...

1.2.11 You should see similar output below :

Code Block

I0326 02:11:40.096473 140685488179008 run_squad.py:1259] Processing example: 0
INFO:tensorflow:prediction_loop marked as finished
I0326 02:11:40.165820 140685488179008 error_handling.py:101] prediction_loop marked as finished
INFO:tensorflow:prediction_loop marked as finished
I0326 02:11:40.166095 140685488179008 error_handling.py:101] prediction_loop marked as finished
INFO:tensorflow:Writing predictions to: /results/squad1/squad_test/predictions.json
I0326 02:11:40.166555 140685488179008 run_squad.py:745] Writing predictions to: /results/squad1/squad_test/predictions.json
INFO:tensorflow:Writing nbest to: /results/squad1/squad_test/nbest_predictions.json
I0326 02:11:40.166669 140685488179008 run_squad.py:746] Writing nbest to: /results/squad1/squad_test/nbest_predictions.json

1.2.12 Check 12 Check correctness in file : predictions.json

Code Block

{
    "56ddde6b9a695914005b9628": "Sundar Pichai",
    "56ddde6b9a695914005b9629": "Larry Page and Sergey Brin",
    "56ddde6b9a695914005b9630": "September 4, 1998",
    "56ddde6b9a695914005b9631": "CEO",
    "56ddde6b9a695914005b9632": "Alphabet Inc"
}

1.2.13 Check 13 Check accuracy in file: nbest_predictions.json

...

Scores will be derived from the nbest_predictions.json output for each question on the context.

1.3 Competition 3 Competition Limits:

Must stick to pre-defined model (BERT-Base, Uncased)

...

You must prove all scripts and methodology used to achieve results

1.4 Teams 4 Teams must produce:

Training scripts with their full training routine and command lines and output

...

run_squad.py predictions.json and nbest_predictions.json

1.5 5 Method of final scoring procedure :

...

Final scores from unseen data of multiple questions; prediction from file, using standard run_squad.py

1.6 Important6 Important: Final evaluation scripts and ckpt directories and files must be submitted for approval, 90 minutes before the end of competition.

...

Version	Old Version 12	New Version 13
Changes made by	Scot Schultz	Scot Schultz
Saved on	Apr 02, 2020	Apr 02, 2020

Versions Compared

Key

1.1.2 About SQuAD 1.1

1.2 Running SQuAD 1.1 fine tuning and inference

1.2.1 Using Docker and NVIDIA Docker Image

1.2.4 Download the SQuAD 1.1 dataset

1.2.5 Start fine tuning

1.2.6 Verify results

1.2.9 Create a sample input file in json format (note the "id" to reference later).

1.2.11 You should see similar output below :

1.2.12 Check 12 Check correctness in file : predictions.json

1.2.13 Check 13 Check accuracy in file: nbest_predictions.json

1.3 Competition 3 Competition Limits:

1.4 Teams 4 Teams must produce:

1.5 5 Method of final scoring procedure :

1.6 Important6 Important: Final evaluation scripts and ckpt directories and files must be submitted for approval, 90 minutes before the end of competition.