microsoft · Jan 18, 2021
diff --git a/‎Text-Code/NL-code-search-WebQuery/README.md
Lines changed: 47 additions & 41 deletions b/‎Text-Code/NL-code-search-WebQuery/README.md
Lines changed: 47 additions & 41 deletions
diff --git a/‎Text-Code/NL-code-search-WebQuery/code/models.py
Lines changed: 44 additions & 0 deletions b/‎Text-Code/NL-code-search-WebQuery/code/models.py
Lines changed: 44 additions & 0 deletions
@@ -1,14 +1,12 @@
-# CodeXGLUE -- NL-code-search-WebQuery
-
-Here is the code and data for NL-code-search-WebQuery task.
+# CodeXGLUE -- Code Search (WebQueryTest)
 
 ## Task Description
 
-Code Search is aimed to find a code snippet which best matches the demand of the query. This task can be formulated in two scenarios: retrieval scenario and text-code classification scenario. In NL-code-search-WebQuery, we present the Code Search in text-code classification scenario.
+Code Search is aimed to find a code snippet which best matches the demand of the query. This task can be formulated in two scenarios: retrieval scenario and text-code classification scenario. In WebQueryTest , we present the Code Search in text-code classification scenario.
 
-In NL-code-search-WebQuery, a trained model needs to judge whether a code snippet answers a given natural language query, which can be formulated into a binary classification problem. 
+In WebQueryTest, a trained model needs to judge whether a code snippet answers a given natural language query, which can be formulated into a binary classification problem. 
 
-Most  existing  code search datasets use code documentations or questions from online communities for software developers as queries, which is still different from real user search queries.  Therefore we provide NL-code-searhc-WebQuery  testing set.
+Most  existing  code search datasets use code documentations or questions from online communities for software developers as queries, which is still different from real user search queries.  Therefore we provide WebQueryTest testing set.
 
 ## Dependency
 
@@ -19,9 +17,9 @@ Most  existing  code search datasets use code documentations or questions from o
 
 ## Data
 
-Here we present NL-code-search-WebQuery dataset,  a  testing  set  of  Python code  search of 1,046  query-code pairs with code search intent and their human annotations. The realworld user queries are collected from Bing query logs and the code for queries are from CodeSearchNet. You can find our testing set in `./data/test_webquery.json` .
+Here we present WebQueryTest dataset,  a  testing  set  of  Python code  search of 1,046  query-code pairs with code search intent and their human annotations. The realworld user queries are collected from Bing query logs and the code for queries are from CodeSearchNet. You can find our testing set in `./data/test_webquery.json` .
 
-Since there's no direct training set for our WebQueryTest set, we finetune the models on an external training set by using the documentation-function pairs in the training set o fCodeSearchNet AdvTest as positive instances. For each documentation, we also randomly sample 31 more functions to form negative instances. You can run the following command to download and preprocess the data:
+Since there's no direct training set for our WebQueryTest dataset, we finetune the models on an external training set by using the documentation-function pairs in the training set o CodeSearchNet AdvTest as positive instances. For each documentation, we also randomly sample 31 more functions to form negative instances. You can run the following command to download and preprocess the data:
 
 ```shell
 cd data
@@ -36,11 +34,11 @@ cd ..
 
 #### Data statistics
 
-Data statistics of NL-code-search-WebQuery are shown in the table below:
+Data statistics of WebQueryTest are shown in the table below:
 
-|               | #Examples |
-| ------------- | :-------: |
-| test_webquery |   1,046   |
+|              | #Examples |
+| :----------: | :-------: |
+| WebQueryTest |   1,046   |
 
 
 ## Fine-tuning
@@ -50,73 +48,81 @@ You can use the following command to finetune:
 ```shell
 python code/run_classifier.py \
 			--model_type roberta \
-			--task_name webquery \
 			--do_train \
 			--do_eval \
 			--eval_all_checkpoints \
-			--train_file train_codesearchnet_31.json \
-			--dev_file dev_codesearch_net.json \
+			--train_file train_codesearchnet_7.json \
+			--dev_file dev_codesearchnet.json \
 			--max_seq_length 200 \
-			--per_gpu_train_batch_size 32 \
-			--per_gpu_eval_batch_size 32 \
+			--per_gpu_train_batch_size 16 \
+			--per_gpu_eval_batch_size 16 \
 			--learning_rate 1e-5 \
 			--num_train_epochs 3 \
-			--gradient_accumulation_steps 1  \
+			--gradient_accumulation_steps 1 \
 			--warmup_steps 5000 \
-			--overwrite_output_dir \
+			--evaluate_during_training \
 			--data_dir ./data/ \
-			--output_dir ./model/ \
-			--model_name_or_path microsoft/codebert-base \
-			--config_name roberta-base
+			--output_dir ./model \
+			--encoder_name_or_path microsoft/codebert-base 
 
 ```
 
 ## Evaluation
 
-To test on the WebQuery testset, you run the following command. Also it will automatically generate predictions to `--prediction_file`.
+To test on the WebQueryTest, you run the following command. Also it will automatically generate predictions to `--prediction_file`.
 
 ```shell
 python code/run_classifier.py \
 			--model_type roberta \
-			--task_name webquery \
 			--do_predict \
 			--test_file test_webquery.json \
 			--max_seq_length 200 \
 			--per_gpu_eval_batch_size 2 \
-			--data_dir ./data/ \
-			--output_dir ./model/checkpoint-best/ \
-			--model_name_or_path ./model/checkpoint-best/ \
-			--pred_model_dir ./model/checkpoint-best/ \
-			--test_result_dir ./model/test_results_webquery \
-			--prediction_file ./evaluator/webquery_predictions.txt
+			--data_dir ./data \
+			--output_dir ./model/checkpoint-best-aver/ \
+			--encoder_name_or_path microsoft/codebert-base \
+			--pred_model_dir ./model/checkpoint-last/ \
+			--prediction_file ./evaluator/webquery_predictions.txt 
+			
 ```
 
-After generate predictions for WebQuery testset, you can use our provided script to evaluate:
+After generate predictions for WebQueryTest, you can use our provided script to evaluate:
 
 ```shell
 python evaluator/evaluator.py \
-		--answers_webquery evaluator/webquery_answers.txt \
+		--answers_webquery ./evaluator/webquery_answers.txt \
 		--predictions_webquery evaluator/webquery_predictions.txt
 ```
 
 ## Results
 
-The results on NL-code-search-WebQuery are shown as below:
+The results on WebQueryTest are shown as below:
 
-|    testset    |  model   | Precision | Recall |  F1   | Accuracy |
-| :-----------: | :------: | :-------: | :----: | :---: | :------: |
-| test-WebQuery | RoBERTa  |   49.50   | 70.62  | 58.20 |  58.64   |
-| test-WebQuery | CodeBERT |   49.92   | 75.12  | 59.98 |  59.56   |
+|   dataset    |  model   |  F1   | Accuracy |
+| :----------: | :------: | :---: | :------: |
+| WebQueryTest | RoBERTa  | 57.49 |  40.92   |
+| WebQueryTest | CodeBERT | 58.95 |  53.37   |
 
 ## Cite
 
-If you use this code or our NL-code-search-WebQuery dataset, please considering citing CodeXGLUE:	
+If you use this code or our WebQueryTest dataset, please considering citing CodeXGLUE and CodeSearchNet:	
 
-<pre><code>@article{CodeXGLUE,
+```
+@article{CodeXGLUE,
   title={CodeXGLUE: An Open Challenge for Code Intelligence},
   journal={arXiv},
   year={2020},
-}</code>
-</pre>
+}
+```
+
+```
+@article{husain2019codesearchnet,
+  title={Codesearchnet challenge: Evaluating the state of semantic code search},
+  author={Husain, Hamel and Wu, Ho-Hsiang and Gazit, Tiferet and Allamanis, Miltiadis and Brockschmidt, Marc},
+  journal={arXiv preprint arXiv:1909.09436},
+  year={2019}
+}
+```
+
 
 
@@ -0,0 +1,44 @@
+import torch
+import torch.nn as nn
+import torch
+from torch.autograd import Variable
+import copy
+from transformers.modeling_bert import BertLayerNorm
+import torch.nn.functional as F
+from torch.nn import CrossEntropyLoss, MSELoss
+# from transformers import (WEIGHTS_NAME, AdamW, get_linear_schedule_with_warmup,
+#                           BertConfig, BertForMaskedLM, BertTokenizer,
+#                           GPT2Config, GPT2LMHeadModel, GPT2Tokenizer,
+#                           OpenAIGPTConfig, OpenAIGPTLMHeadModel, OpenAIGPTTokenizer,
+#                           RobertaConfig, RobertaModel, RobertaTokenizer,
+#                           DistilBertConfig, DistilBertForMaskedLM, DistilBertTokenizer)
+from transformers.modeling_utils import PreTrainedModel
+
+
+class Model(PreTrainedModel):
+    def __init__(self, encoder, config, tokenizer, args):
+        super(Model, self).__init__(config)
+        self.encoder = encoder
+        self.config = config
+        self.tokenizer = tokenizer
+        self.mlp = nn.Sequential(nn.Linear(768*4, 768),
+                                 nn.Tanh(),
+                                 nn.Linear(768, 1),
+                                 nn.Sigmoid())
+        self.loss_func = nn.BCELoss()
+        self.args = args
+
+    def forward(self, code_inputs, nl_inputs, labels, return_vec=False):
+        bs = code_inputs.shape[0]
+        inputs = torch.cat((code_inputs, nl_inputs), 0)
+        outputs = self.encoder(inputs, attention_mask=inputs.ne(1))[1]
+        code_vec = outputs[:bs]
+        nl_vec = outputs[bs:]
+        if return_vec:
+            return code_vec, nl_vec
+
+        logits = self.mlp(torch.cat((nl_vec, code_vec, nl_vec-code_vec, nl_vec*code_vec), 1))
+        loss = self.loss_func(logits, labels.float())
+        predictions = (logits > 0.5).int()  # (Batch, )
+        return loss, predictions
+