Skip to content

Commit 17f40a3

Browse files
julien-clhoestq
andauthored
fix some broken links (#7859)
* fix some broken links * some more --------- Co-authored-by: Quentin Lhoest <[email protected]>
1 parent cf647ab commit 17f40a3

17 files changed

+42
-42
lines changed

docs/source/dataset_card.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,4 @@ Creating a dataset card is easy and can be done in just a few steps:
2424

2525
YAML also allows you to customize the way your dataset is loaded by [defining splits and/or configurations](./repository_structure#define-your-splits-and-subsets-in-yaml) without the need to write any code.
2626

27-
Feel free to take a look at the [SNLI](https://huggingface.co/datasets/snli), [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail), and [Allociné](https://huggingface.co/datasets/allocine) dataset cards as examples to help you get started.
27+
Feel free to take a look at the [SNLI](https://huggingface.co/datasets/stanfordnlp/snli), [CNN/DailyMail](https://huggingface.co/datasets/abisee/cnn_dailymail), and [Allociné](https://huggingface.co/datasets/tblard/allocine) dataset cards as examples to help you get started.

docs/source/faiss_es.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ FAISS retrieves documents based on the similarity of their vector representation
2222

2323
```py
2424
>>> from datasets import load_dataset
25-
>>> ds = load_dataset('crime_and_punish', split='train[:100]')
25+
>>> ds = load_dataset('community-datasets/crime_and_punish', split='train[:100]')
2626
>>> ds_with_embeddings = ds.map(lambda example: {'embeddings': ctx_encoder(**ctx_tokenizer(example["line"], return_tensors="pt"))[0][0].numpy()})
2727
```
2828

@@ -62,7 +62,7 @@ FAISS retrieves documents based on the similarity of their vector representation
6262
7. Reload it at a later time with [`Dataset.load_faiss_index`]:
6363

6464
```py
65-
>>> ds = load_dataset('crime_and_punish', split='train[:100]')
65+
>>> ds = load_dataset('community-datasets/crime_and_punish', split='train[:100]')
6666
>>> ds.load_faiss_index('embeddings', 'my_index.faiss')
6767
```
6868

docs/source/image_load.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ When you load an image dataset and call the image column, the images are decoded
1010
```py
1111
>>> from datasets import load_dataset, Image
1212

13-
>>> dataset = load_dataset("beans", split="train")
13+
>>> dataset = load_dataset("AI-Lab-Makerere/beans", split="train")
1414
>>> dataset[0]["image"]
1515
```
1616

@@ -33,7 +33,7 @@ You can load a dataset from the image path. Use the [`~Dataset.cast_column`] fun
3333
If you only want to load the underlying path to the image dataset without decoding the image object, set `decode=False` in the [`Image`] feature:
3434

3535
```py
36-
>>> dataset = load_dataset("beans", split="train").cast_column("image", Image(decode=False))
36+
>>> dataset = load_dataset("AI-Lab-Makerere/beans", split="train").cast_column("image", Image(decode=False))
3737
>>> dataset[0]["image"]
3838
{'bytes': None,
3939
'path': '/root/.cache/huggingface/datasets/downloads/extracted/b0a21163f78769a2cf11f58dfc767fb458fc7cea5c05dccc0144a2c0f0bc1292/train/bean_rust/bean_rust_train.29.jpg'}

docs/source/loading.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -327,7 +327,7 @@ Select specific rows of the `train` split:
327327
```py
328328
>>> train_10_20_ds = datasets.load_dataset("ajibawa-2023/General-Stories-Collection", split="train[10:20]")
329329
===STRINGAPI-READINSTRUCTION-SPLIT===
330-
>>> train_10_20_ds = datasets.load_dataset("bookcorpu", split=datasets.ReadInstruction("train", from_=10, to=20, unit="abs"))
330+
>>> train_10_20_ds = datasets.load_dataset("rojagtap/bookcorpus", split=datasets.ReadInstruction("train", from_=10, to=20, unit="abs"))
331331
```
332332

333333
Or select a percentage of a split with:

docs/source/object_detection.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,14 @@ To run these examples, make sure you have up-to-date versions of [albumentations
88
pip install -U albumentations opencv-python
99
```
1010

11-
In this example, you'll use the [`cppe-5`](https://huggingface.co/datasets/cppe-5) dataset for identifying medical personal protective equipment (PPE) in the context of the COVID-19 pandemic.
11+
In this example, you'll use the [`cppe-5`](https://huggingface.co/datasets/rishitdagli/cppe-5) dataset for identifying medical personal protective equipment (PPE) in the context of the COVID-19 pandemic.
1212

1313
Load the dataset and take a look at an example:
1414

1515
```py
1616
>>> from datasets import load_dataset
1717

18-
>>> ds = load_dataset("cppe-5")
18+
>>> ds = load_dataset("rishitdagli/cppe-5")
1919
>>> example = ds['train'][0]
2020
>>> example
2121
{'height': 663,

docs/source/quickstart.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -288,7 +288,7 @@ pip install -U albumentations opencv-python
288288

289289
## NLP
290290

291-
Text needs to be tokenized into individual tokens by a [tokenizer](https://huggingface.co/docs/transformers/main_classes/tokenizer). For the quickstart, you'll load the [Microsoft Research Paraphrase Corpus (MRPC)](https://huggingface.co/datasets/glue/viewer/mrpc) training dataset to train a model to determine whether a pair of sentences mean the same thing.
291+
Text needs to be tokenized into individual tokens by a [tokenizer](https://huggingface.co/docs/transformers/main_classes/tokenizer). For the quickstart, you'll load the [Microsoft Research Paraphrase Corpus (MRPC)](https://huggingface.co/datasets/nyu-mll/glue/viewer/mrpc) training dataset to train a model to determine whether a pair of sentences mean the same thing.
292292

293293
**1**. Load the MRPC dataset by providing the [`load_dataset`] function with the dataset name, dataset configuration (not all datasets will have a configuration), and dataset split:
294294

docs/source/stream.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -160,11 +160,11 @@ You can split your dataset one of two ways:
160160

161161
🤗 Datasets supports sharding to divide a very large dataset into a predefined number of chunks. Specify the `num_shards` parameter in [`~IterableDataset.shard`] to determine the number of shards to split the dataset into. You'll also need to provide the shard you want to return with the `index` parameter.
162162

163-
For example, the [amazon_polarity](https://huggingface.co/datasets/amazon_polarity) dataset has 4 shards (in this case they are 4 Parquet files):
163+
For example, the [amazon_polarity](https://huggingface.co/datasets/fancyzhx/amazon_polarity) dataset has 4 shards (in this case they are 4 Parquet files):
164164

165165
```py
166166
>>> from datasets import load_dataset
167-
>>> dataset = load_dataset("amazon_polarity", split="train", streaming=True)
167+
>>> dataset = load_dataset("fancyzhx/amazon_polarity", split="train", streaming=True)
168168
>>> print(dataset)
169169
IterableDataset({
170170
features: ['label', 'title', 'content'],

docs/source/use_with_jax.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -195,11 +195,11 @@ part.
195195

196196
The easiest way to get JAX arrays out of a dataset is to use the `with_format('jax')` method. Lets assume
197197
that we want to train a neural network on the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) available
198-
at the HuggingFace Hub at https://huggingface.co/datasets/mnist.
198+
at the HuggingFace Hub at https://huggingface.co/datasets/ylecun/mnist.
199199

200200
```py
201201
>>> from datasets import load_dataset
202-
>>> ds = load_dataset("mnist")
202+
>>> ds = load_dataset("ylecun/mnist")
203203
>>> ds = ds.with_format("jax")
204204
>>> ds["train"][0]
205205
{'image': DeviceArray([[ 0, 0, 0, ...],

docs/source/use_with_numpy.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ at the HuggingFace Hub at https://huggingface.co/datasets/mnist.
160160

161161
```py
162162
>>> from datasets import load_dataset
163-
>>> ds = load_dataset("mnist")
163+
>>> ds = load_dataset("ylecun/mnist")
164164
>>> ds = ds.with_format("numpy")
165165
>>> ds["train"][0]
166166
{'image': array([[ 0, 0, 0, ...],

src/datasets/arrow_dataset.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1970,7 +1970,7 @@ def class_encode_column(self, column: str, include_nulls: bool = False) -> "Data
19701970
19711971
```py
19721972
>>> from datasets import load_dataset
1973-
>>> ds = load_dataset("boolq", split="validation")
1973+
>>> ds = load_dataset("google/boolq", split="validation")
19741974
>>> ds.features
19751975
{'answer': Value('bool'),
19761976
'passage': Value('string'),
@@ -4725,7 +4725,7 @@ def train_test_split(
47254725
>>> ds = ds.train_test_split(test_size=0.2, seed=42)
47264726
47274727
# stratified split
4728-
>>> ds = load_dataset("imdb",split="train")
4728+
>>> ds = load_dataset("stanfordnlp/imdb",split="train")
47294729
Dataset({
47304730
features: ['text', 'label'],
47314731
num_rows: 25000
@@ -6175,15 +6175,15 @@ def add_faiss_index(
61756175
Example:
61766176
61776177
```python
6178-
>>> ds = datasets.load_dataset('crime_and_punish', split='train')
6178+
>>> ds = datasets.load_dataset('community-datasets/crime_and_punish', split='train')
61796179
>>> ds_with_embeddings = ds.map(lambda example: {'embeddings': embed(example['line']}))
61806180
>>> ds_with_embeddings.add_faiss_index(column='embeddings')
61816181
>>> # query
61826182
>>> scores, retrieved_examples = ds_with_embeddings.get_nearest_examples('embeddings', embed('my new query'), k=10)
61836183
>>> # save index
61846184
>>> ds_with_embeddings.save_faiss_index('embeddings', 'my_index.faiss')
61856185
6186-
>>> ds = datasets.load_dataset('crime_and_punish', split='train')
6186+
>>> ds = datasets.load_dataset('community-datasets/crime_and_punish', split='train')
61876187
>>> # load index
61886188
>>> ds.load_faiss_index('embeddings', 'my_index.faiss')
61896189
>>> # query
@@ -6314,7 +6314,7 @@ def add_elasticsearch_index(
63146314
63156315
```python
63166316
>>> es_client = elasticsearch.Elasticsearch()
6317-
>>> ds = datasets.load_dataset('crime_and_punish', split='train')
6317+
>>> ds = datasets.load_dataset('community-datasets/crime_and_punish', split='train')
63186318
>>> ds.add_elasticsearch_index(column='line', es_client=es_client, es_index_name="my_es_index")
63196319
>>> scores, retrieved_examples = ds.get_nearest_examples('line', 'my new query', k=10)
63206320
```

0 commit comments

Comments
 (0)