-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
The following example code executes in 1.3s on my MacBook...
use Scraper\Scrape\Crawler\Types\GeneralCrawler;
use Scraper\Scrape\Extractor\Types\MultipleRowExtractor;
require_once(__DIR__ . '/../vendor/autoload.php');
date_default_timezone_set('UTC');
// Create crawler
$crawler = new GeneralCrawler('https://coinmarketcap.com/');
// Setup configuration
$configuration = new \Scraper\Structure\Configuration();
$configuration->setTargetXPath('//table[@id="currencies"]');
$configuration->setRowXPath('.//tbody/tr');
$configuration->setFields(
[
new \Scraper\Structure\TextField(
[
'name' => 'Rank',
'xpath' => '//td[1]',
]
),
new \Scraper\Structure\TextField(
[
'name' => 'Name',
'xpath' => '//td[2]',
]
),
new \Scraper\Structure\TextField(
[
'name' => 'Market Cap',
'xpath' => '//td[3]',
]
),
new \Scraper\Structure\TextField(
[
'name' => 'Price',
'xpath' => '//td[4]',
]
),
new \Scraper\Structure\RegexField(
[
'name' => '% Change',
'xpath' => '//td[7]',
'regex' => '/(.*)%/'
]
),
]
);
// Extract data
$extractor = new MultipleRowExtractor($crawler, $configuration);
$data = $extractor->extract();
print_r($data);
However this slightly tweaked version takes 4.5 minutes!
use Scraper\Scrape\Crawler\Types\GeneralCrawler;
use Scraper\Scrape\Extractor\Types\MultipleRowExtractor;
require_once(__DIR__ . '/../vendor/autoload.php');
date_default_timezone_set('UTC');
// Create crawler
$crawler = new GeneralCrawler('https://coinmarketcap.com/currencies/volume/monthly/');
// Setup configuration
$configuration = new \Scraper\Structure\Configuration();
$configuration->setTargetXPath('//table[@id="currencies-volume"]');
$configuration->setRowXPath('.//tbody/tr');
$configuration->setFields(
[
new \Scraper\Structure\TextField(
[
'name' => 'Rank',
'xpath' => './/td[1]',
]
),
new \Scraper\Structure\TextField(
[
'name' => 'Name',
'xpath' => './/td[2]',
]
),
new \Scraper\Structure\TextField(
[
'name' => 'Symbol',
'xpath' => './/td[3]',
]
),
new \Scraper\Structure\TextField(
[
'name' => 'Volume_1D',
'xpath' => './/td[4]',
]
),
new \Scraper\Structure\TextField(
[
'name' => 'Volume_7D',
'xpath' => './/td[5]',
]
),
new \Scraper\Structure\TextField(
[
'name' => 'Volume_30D',
'xpath' => './/td[6]',
]
),
]
);
// Extract data
$extractor = new MultipleRowExtractor($crawler, $configuration);
$data = $extractor->extract();
print_r(array_slice($data, 0, 10));
Are you able to confirm this performance problem on your system, and if so, then why is there such a performance hit?
Metadata
Metadata
Assignees
Labels
No labels