Table of Contents

PLEASE SEE UPDATES IN SECTION 1.6

Introduction

Language understanding is an ongoing challenge and one of the most relevant and influential areas across any industry.

...

Code Block

root@tessa002:/workspace/nvidia-examples/bert/data# mkdir download
root@tessa002:/workspace/nvidia-examples/bert/data# cd download
root@tessa002:/workspace/nvidia-examples/bert/data/download# mkdir -p download/google_pretrained_weights
root@tessa002:/workspace/nvidia-examples/bert/data/download# cd download/google_pretrained_weights/
root@tessa002:/workspace/nvidia-examples/bert/data/download/google_pretrained_weights# wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
root@tessa002:/workspace/nvidia-examples/bert/data/download/google_pretrained_weights# unzip uncased_L-12_H-768_A-12.zip
Archive:  uncased_L-12_H-768_A-12.zip
   creating: uncased_L-12_H-768_A-12/
  inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.meta
  inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.data-00000-of-00001
  inflating: uncased_L-12_H-768_A-12/vocab.txt
  inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.index
  inflating: uncased_L-12_H-768_A-12/bert_config.json

...

Code Block

root@tessa002:/workspace/nvidia-examples/bert/data/download# mkdir -p squad/v1.1
root@tessa002:/workspace/nvidia-examples/bert/data/download# cd squad/v1.1
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad# mkdir squad/v1.1# wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad# cd v1.1/
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1# wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1# wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1#  wget https://github.com/allenai/bi-att-flow/archive/master.zip
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1# unzip master.zip
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1# cdcp bi-att-flow-master/
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1/bi-att-flow-master# cd squad
root@tessa002:/workspace/nvidia-examples/bert/data/download/squad/v1.1/bi-att-flow-master/squad# cp evaluate-v1.1.py /workspace/nvidia-examples/bert/data/download/squad/v1.1/squad/evaluate-v1.1.py .
root@tessa002:cd /workspace/nvidia-examples/bert

...

Note : consider logging results with “>2&1 | tee $LOGFILE” for submissions to judges

...

Code Block

root@tessa002:/workspace# mkdir lambdal
root@tessa002:/workspace# cd lambdal
root@tessa002:/workspace/lambdal# git clone https://github.com/lambdal/bert
root@tessa002:/workspace/lambdal/lambdal# cd bert
root@tessa002:/workspace/lambdal/bert# mpirun -np 4 -H localhost:4 -bind-to none -map-by slot -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -mca pml ob1 -mca btl ^openib --allow-run-as-root python3 run_squad_hvd.py --vocab_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/vocab.txt   --bert_config_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/bert_config.json   --init_checkpoint=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/bert_model.ckpt  --do_train=True   --train_file=/workspace/nvidia-examples/bert/data/download/squad/v1.1/train-v1.1.json   --do_predict=True   --predict_file=/workspace/nvidia-examples/bert/data/download/squad/v1.1/dev-v1.1.json --train_batch_size=12 --learning_rate=3e-5   --num_train_epochs=2.0   --max_seq_length=384   --doc_stride=128   --output_dir=/results/lambdal/squad1/squad_base/ --horovod=true

...

{"exact_match": 78.1929990539262, "f1": 86.51319484763773}

1.2.8 Example predict Q&A on real data

...

Example predict Q&A on real data is available here: github.com/google-research/bert

Note : This is the method that judges will use to score unseen data

...

Code Block

{
    "version": "v1.1",
    "data": [
        {
            "title": "your_title",
            "paragraphs": [
                {
                    "qas": [
                        {
                            "question": "Who is current CEO?",
                            "id": "56ddde6b9a695914005b9628",
                            "is_impossible": ""
                        },
                        {
                            "question": "Who founded google?",
                            "id": "56ddde6b9a695914005b9629",
                            "is_impossible": ""
                        },
                        {
                            "question": "when did IPO take place?",
                            "id": "56ddde6b9a695914005b962a",
                            "is_impossible": ""
                        }
                    ],
                    "context": "Google was founded in 1998 by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University in California. Together they own about 14 percent of its shares and control 56 percent of the stockholder voting power through supervoting stock. They incorporated Google as a privately held company on September 4, 1998. An initial public offering (IPO) took place on August 19, 2004, and Google moved to its headquarters in Mountain View, California, nicknamed the Googleplex. In August 2015, Google announced plans to reorganize its various interests as a conglomerate called Alphabet Inc. Google is Alphabet's leading subsidiary and will continue to be the umbrella company for Alphabet's Internet interests. Sundar Pichai was appointed CEO of Google, replacing Larry Page who became the CEO of Alphabet."                
                 }
            ]
        }
    ]
}

1.2.10 Run run_squad.py

Run run_squad.py as do-predict=true using fine-tuned model checkpoint :

Code Block

root@tessa002:/workspace/bert# python3 run_squad.py --vocab_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/vocab.txt --bert_config_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/bert_config.json   --init_checkpoint=/results/tf_bert_finetuning_squad_base_fp16_gbs40_200326010711/model.ckpt-2408  --do_train=False --max_query_length=30 --do_predict=True   --predict_file=test_input.json --predict_batch_size=16 --max_seq_length=384 --doc_stride=128 --output_dir=/results/squad1/squad_test/

...

Code Block

root@tessa002:/workspace/lambdal/bert# python3 run_squad.py --vocab_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/vocab.txt --bert_config_file=/workspace/nvidia-examples/bert/data/download/google_pretrained_weights/uncased_L-12_H-768_A-12/bert_config.json   --init_checkpoint=/results/lambdal/squad1/squad_base/model.ckpt-3649  --do_train=False --max_query_length=30 --do_predict=True   --predict_file=test_input.json --predict_batch_size=16 --max_seq_length=384 --doc_stride=128 --output_dir=/results/lambdal/squad1/squad_test/

...

You should see similar output below

...

Code Block

I0326 02:11:40.096473 140685488179008 run_squad.py:1259] Processing example: 0
INFO:tensorflow:prediction_loop marked as finished
I0326 02:11:40.165820 140685488179008 error_handling.py:101] prediction_loop marked as finished
INFO:tensorflow:prediction_loop marked as finished
I0326 02:11:40.166095 140685488179008 error_handling.py:101] prediction_loop marked as finished
INFO:tensorflow:Writing predictions to: /results/squad1/squad_test/predictions.json
I0326 02:11:40.166555 140685488179008 run_squad.py:745] Writing predictions to: /results/squad1/squad_test/predictions.json
INFO:tensorflow:Writing nbest to: /results/squad1/squad_test/nbest_predictions.json
I0326 02:11:40.166669 140685488179008 run_squad.py:746] Writing nbest to: /results/squad1/squad_test/nbest_predictions.json

1.2.

...

11 Check correctness in file : predictions.json

Code Block

{
    "56ddde6b9a695914005b9628": "Sundar Pichai",
    "56ddde6b9a695914005b9629": "Larry Page and Sergey Brin",
    "56ddde6b9a695914005b9630": "September 4, 1998",
    "56ddde6b9a695914005b9631": "CEO",
    "56ddde6b9a695914005b9632": "Alphabet Inc"
}

1.2.

...

12 Check accuracy in file: nbest_predictions.json

Code Block

{
    "56ddde6b9a695914005b9628": [
        {
            "text": "Sundar Pichai",
            "probability": 0.6877274611974046,
            "start_logit": 7.016119003295898,
            "end_logit": 6.917689323425293
        },
        {
            "text": "Sundar Pichai was appointed CEO of Google, replacing Larry Page",
            "probability": 0.27466839794889614,
            "start_logit": 7.016119003295898,
            "end_logit": 5.999861240386963
        },
        {
            "text": "Larry Page",
            "probability": 0.02874494871571203,
            "start_logit": 4.759016513824463,
            "end_logit": 5.999861240386963
        },

...

Must stick to pre-defined model (BERT-Base, Uncased)
Teams can locally cache (on SSD) starting model weights and dataset
HuggingFace implementation (TensorFlow/PyTorch) is the official standard. Usage of other implementation, or modification to official, is subject to approval.
Teams are allowed to explore different optimizers (SGD/Adam etc.) or learning rate schedules, or any other techniques that do not modify model architecture.
Teams are not allowed to modify any model hyperparameters or add additional layers.
Entire model must be fine-tuned (cannot freeze layers)
You must provide all scripts and methodology used to achieve results

...

Training scripts with their full training routine and command lines and output
Evaluation-only script for verification of result. Final evaluation is on a fixed sequence length (128 tokens).
Final model ckpt and inference files
Team’s training scripts and methodology, command line and logs of runs
run_squad.py predictions.json and nbest_predictions.json

...

Final scores from unseen data of multiple questions; prediction from file, using standard run_squad.py

1.6 UPDATES (June 8, 2020)

In past discussion we had questions on training BERT from scratch; this is beyond the scope of this competition and is not allowed. You will need to use the BERT-BASE model file as outlined in the guidelines section 1.2.3

Change/modify the output layer and to allow additional layers
Allow for ensemble techniques

We must disallow integration of dev-set data into training dataset ; the SQUAD 1.1 datasets must remain unchanged / augmented

We must disallow additional external data integrated into training dataset for this competition because there is not enough time to be able to verify that the dev-set data might inadvertently be part of that acquired dataset augmentation

We allow any hyper-parameters ; ie. learn rate, optimizer, drop-out, etc.
We will also allow setting for random seed. This will reduce the variance between training runs
The F1 score will be used as score for team ranking.

Teams should submit their best 5 runs, please upload your runs in separate folders containing ckpt, logs, etc. - you/we will average top 3 of the 5 f1 scores for your final score

We will use the F1 as the quality metric for score / ranking. We will not round the output score computed from the output of the evaluate-v1.1.py.

The judges will score with standard evaluate-v1.1.py from Squad 1.1 as outlined in section 1.5 of the SQuAD 1.1 with Tensorflow BERT-BASE Guidelines.

We will use the probability score for unseen inference data (as test_input.json) to be provided no later than June 10th, as a secondary ranking in the event of any tie to the f1 average scoring against your top training run.

Versions Compared

Old Version 21

New Version Current

Key

Introduction

1.2.8 Example predict Q&A on real data

1.2.10 Run run_squad.py

1.2.

11 Check correctness in file : predictions.json

1.2.

12 Check accuracy in file: nbest_predictions.json

1.6 UPDATES (June 8, 2020)

Page Comparison

Versions Compared

Old Version 21

New Version Current

Key

Introduction

1.2.8 Example predict Q&A on real data

1.2.10 Run run_squad.py

1.2.

11 Check correctness in file : predictions.json

1.2.

12 Check accuracy in file: nbest_predictions.json

1.6 UPDATES (June 8, 2020)