feat: use promptsource templates#62
feat: use promptsource templates#62tianjianjiang wants to merge 1 commit intobigscience-workshop:mainfrom
Conversation
4207819 to
5cd09aa
Compare
| torch==1.9.0 | ||
| tqdm==4.62.0 | ||
| transformers==4.9.1 | ||
| promptsource @ git+https://[email protected]/bigscience-workshop/promptsource.git@main |
There was a problem hiding this comment.
A side note: ssh will fail.
|
|
||
| def test_promptsource_template(): | ||
| ds_key, sub_key = "tydiqa", "secondary_task" | ||
| tydiqa_sec_vld_ds = load_dataset(ds_key, sub_key, split="validation", streaming=True) |
There was a problem hiding this comment.
promptsource also has a helper of dataset loading but I really want to use streaming=True if at all possible (depending on each dataset's compression format).
| tydiqa_sec_vld_ds_en = filter(lambda x: x["id"].split("-")[0] == "english", tydiqa_sec_vld_ds) | ||
| template_collection = TemplateCollection() | ||
| tydiqa_sec_tmpls = template_collection.get_dataset(ds_key, sub_key) | ||
| tmpl = tydiqa_sec_tmpls["simple_question_reading_comp_2"] |
There was a problem hiding this comment.
The same prompt template of evaluation.tasks.tydiqa_secondary.TyDiQADataset.
| template_collection = TemplateCollection() | ||
| tydiqa_sec_tmpls = template_collection.get_dataset(ds_key, sub_key) | ||
| tmpl = tydiqa_sec_tmpls["simple_question_reading_comp_2"] | ||
| prompt, _ = tmpl.apply(removeHyphen(next(tydiqa_sec_vld_ds_en))) |
There was a problem hiding this comment.
The return value is actually a list, but if the template didn't apply, then there will be no second element (the expected answer/target).
Although only doing removeHyphen() here, promptsource has some more preprocessing for classification, see https://github.com/bigscience-workshop/promptsource/blob/main/promptsource/seqio_tasks/tasks.py
5cd09aa to
b9ae559
Compare
| tqdm = "4.62.0" | ||
| transformers = "4.9.1" | ||
| promptsource = {git = "https://[email protected]/bigscience-workshop/promptsource.git", rev = "main"} | ||
| aiohttp = "^3.7.4" |
There was a problem hiding this comment.
Same as dataset[streaming] but we may want to control the version of aiohttp separately just in case.
| tqdm==4.62.0 | ||
| transformers==4.9.1 | ||
| promptsource @ git+https://[email protected]/bigscience-workshop/promptsource.git@main | ||
| aiohttp==3.7.4 |
There was a problem hiding this comment.
| torch = "1.9.0" | ||
| tqdm = "4.62.0" | ||
| transformers = "4.9.1" | ||
| promptsource = {git = "https://[email protected]/bigscience-workshop/promptsource.git", rev = "main"} |
There was a problem hiding this comment.
A simple proposal of using promptsource directly such that we don't have to implement it from scratch.