You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: en/lessons/analyzing-documents-with-tfidf.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -346,7 +346,7 @@ The Scikit-Learn `TfidfVectorizer` has several internal settings that can be cha
346
346
347
347
#### 1. stopwords
348
348
349
-
In my code, I used `python stopwords=None` but `python stopwords='english'` is available. This setting will filter out words using a [preselected list](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/_stop_words.py) of high frequency function words such as 'the', 'to', and 'of'. Depending on your settings, many of these terms will have low __tf-idf__ scores regardless because they tend to be found in all documents. For a discussion of some publicly available stop word lists (including Scikit-Learn's), see ["Stop Word Lists in Free Open-source Software Packages"](https://doi.org/10.18653/v1/W18-2502).
349
+
In my code, I used `python stopwords=None` but `python stopwords='english'` is available. This setting will filter out words using a [preselected list](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/feature_extraction/_stop_words.py) of high frequency function words such as 'the', 'to', and 'of'. Depending on your settings, many of these terms will have low __tf-idf__ scores regardless because they tend to be found in all documents. For a discussion of some publicly available stop word lists (including Scikit-Learn's), see ["Stop Word Lists in Free Open-source Software Packages"](https://perma.cc/V4J7-HMWH).
Copy file name to clipboardExpand all lines: en/lessons/detecting-text-reuse-with-passim.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -905,5 +905,5 @@ MR gratefully acknowledges the financial support of the Swiss National Science F
905
905
8. Aleksi Vesanto, Asko Nivala, Heli Rantala, Tapio Salakoski, Hannu Salmi, Filip Ginter. Applying BLAST to Text Reuse Detection in Finnish Newspapers and Journals, 1771-1910. 54–58 In *Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language*. Linköping University Electronic Press, 2017. [Link](https://aclanthology.org/W17-0510.pdf)
906
906
9. Hannu Salmi, Heli Rantala, Aleksi Vesanto, Filip Ginter. The long-term reuse of text in the Finnish press, 1771–1920. **2364**, 394–544 In *CEUR Workshop Proceedings*. (2019).
907
907
10. Axel J Soto, Abidalrahman Mohammad, Andrew Albert, Aminul Islam, Evangelos Milios, Michael Doyle, Rosane Minghim, Maria Cristina de Oliveira. Similarity-Based Support for Text Reuse in Technical Writing. 97–106 In *Proceedings of the 2015 ACM Symposium on Document Engineering*. ACM, 2015. [Link](http://dx.doi.org/10.1145/2682571.2797068)
908
-
11. Alexandra Schofield, Laure Thompson, David Mimno. Quantifying the Effects of Text Duplication on Semantic Models. 2737–2747 In *Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing*. Association for Computational Linguistics, 2017. [Link](https://doi.org/10.18653/v1/D17-1290)
908
+
11. Alexandra Schofield, Laure Thompson, David Mimno. Quantifying the Effects of Text Duplication on Semantic Models. 2737–2747 In *Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing*. Association for Computational Linguistics, 2017. [https://doi.org/10.18653/v1/D17-1290](https://perma.cc/KSK6-5TXP)
909
909
12. Matteo Romanello, Aurélien Berra, Alexandra Trachsel. Rethinking Text Reuse as Digital Classicists. *Digital Humanities conference*, 2014. [Link](https://wiki.digitalclassicist.org/Text_Reuse)
Copy file name to clipboardExpand all lines: en/lessons/geoparsing-text-with-edinburgh.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -384,7 +384,7 @@ Rosa Filgueira, Claire Grover, Vasilios Karaiskos, Beatrice Alex, Sarah Van Eynd
384
384
385
385
Rosa Filgueira, Claire Grover, Melissa Terras, and Beatrice Alex (2020). Geoparsing the historical Gazetteers of Scotland: accurately computing location in mass digitised texts. In Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora, pages 24–30, Marseille, France. European Language Resources Association.
386
386
387
-
Claire Grover and Richard Tobin (2014). A Gazetteer and Georeferencing for Historical English Documents. In Proceedings of LaTeCH 2014 at EACL 2014. Gothenburg, Sweden. [[pdf](https://doi.org/10.3115/v1/W14-0617)]
387
+
Claire Grover and Richard Tobin (2014). A Gazetteer and Georeferencing for Historical English Documents. In Proceedings of LaTeCH 2014 at EACL 2014. Gothenburg, Sweden. [https://doi.org/10.3115/v1/W14-0617](https://perma.cc/S8XG-8TH3)
388
388
389
389
Claire Grover, Richard Tobin, Kate Byrne, Matthew Woollard, James Reid, Stuart Dunn, and Julian Ball (2010). Use of the Edinburgh Geoparser for georeferencing digitised historical collections. Philosophical Transactions of the Royal Society A. [[pdf](http://homepages.inf.ed.ac.uk/grover/papers/PTRS-A-2010-Grover-3875-89.pdf)]
Copy file name to clipboardExpand all lines: en/lessons/interrogating-national-narrative-gpt.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -386,7 +386,7 @@ The use of generated text as an analytical tool is relatively novel, as is the a
386
386
[^1]: Jeffrey Wu et al., "Language Models Are Unsupervised Multitask Learners," *OpenAI*, (February 2019): 7, [https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf](https://perma.cc/7HCZ-DX87).
387
387
[^2]: David Tarditi, Sidd Puri, and Jose Oglesby, "Accelerator: Using data parallelism to program GPUs for general-purpose uses," *Operating Systems Review* 40, (2006): 325-326. [https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2005-184.pdf](https://perma.cc/QDX9-33R6).
388
388
[^3]: Shawn Graham, *An Enchantment of Digital Archaeology: Raising the Dead with Agent-Based Models, Archaeogaming, and Artificial Intelligence* (New York: Berghahn Books, 2020), 118.
389
-
[^4]: Emily M. Bender and Alexander Koller, "Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data," (paper presented at Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, July 5 2020): 5187. [https://doi.org/10.18653/v1/2020.acl-main.463](https://doi.org/10.18653/v1/2020.acl-main.463).
389
+
[^4]: Emily M. Bender and Alexander Koller, "Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data," (paper presented at Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, July 5 2020): 5187. [https://doi.org/10.18653/v1/2020.acl-main.463](https://perma.cc/XH59-96ML).
390
390
[^5]: Kari Kraus, "Conjectural Criticism: Computing Past and Future Texts," *Digital Humanities Quarterly* 3, no. 4 (2009). [http://www.digitalhumanities.org/dhq/vol/3/4/000069/000069.html](https://perma.cc/C7D7-H7WY).
391
391
[^6]: Alexandra Borchardt, Felix M. Simon, and Diego Bironzo, *Interested but not Engaged: How Europe’s Media Cover Brexit,* (Oxford: Reuters Institute for the Study of Journalism, 2018), 23, [https://reutersinstitute.politics.ox.ac.uk/sites/default/files/2018-06/How%20Europe%27s%20Media%20Cover%20Brexit.pdf](https://perma.cc/8S2H-9ZDV).
Copy file name to clipboardExpand all lines: fr/lecons/detecter-la-reutilisation-de-texte-avec-passim.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -913,5 +913,5 @@ Matteo Romanello remercie le Fonds national suisse de la recherche scientifique
913
913
8. Vesanto, Aleksi, Asko Nivala, Heli Rantala, Tapio Salakoski, Hannu Salmi et Filip Ginter. « Applying BLAST to Text Reuse Detection in Finnish Newspapers and Journals, 1771-1910 ». *Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language* (2017): 54–58. [Lien](https://aclanthology.org/W17-0510.pdf)
914
914
9. Salmi, Hannu, Heli Rantala, Aleksi Vesanto et Filip Ginter. « The long-term reuse of text in the Finnish press, 1771–1920 ». *CEUR Workshop Proceedings* 2364 (2019): 394–544.
915
915
10. Soto, Axel J, Abidalrahman Mohammad, Andrew Albert, Aminul Islam, Evangelos Milios, Michael Doyle, Rosane Minghim et Maria Cristina de Oliveira. « Similarity-Based Support for Text Reuse in Technical Writing ». *Proceedings of the 2015 ACM Symposium on Document Engineering* (2015): 97–106. [Lien](http://dx.doi.org/10.1145/2682571.2797068)
916
-
11. Schofield, Alexandra, Laure Thompson et David Mimno. « Quantifying the Effects of Text Duplication on Semantic Models ». *Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing* (2017): 2737–2747. [Lien](https://doi.org/10.18653/v1/D17-1290)
916
+
11. Schofield, Alexandra, Laure Thompson et David Mimno. « Quantifying the Effects of Text Duplication on Semantic Models ». *Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing* (2017): 2737–2747. [https://doi.org/10.18653/v1/D17-1290](https://perma.cc/KSK6-5TXP)
917
917
12. Romanello, Matteo, Aurélien Berra et Alexandra Trachsel. « Rethinking Text Reuse as Digital Classicists ». *Digital Humanities conference* (2014). [Lien](https://wiki.digitalclassicist.org/Text_Reuse)
0 commit comments