I’ve ignored generators for sometime in PHP, but recently realised why they can be quite handy š
As an example, imagine you are querying a web service, which returnsĀ data in chunks of up to 100 results….
To get results 100-200, it’s necessary to pass an ‘offset’ value to the query ($after).
The JSON response from the service looks a little like :
{
"paging": { "after": "foo" },
"data": [
"stuff.we.care.about",
"stuff.we.care.about",
"stuff.we.care.about",
]
}
Naive (without error handling etc) PHP code for this could look a bit like the below :
function get_things_in_chunks($after = null) {
$x = new \GuzzleHttp\Client(/* config */);
$response = $x->get(
'https://graph.facebook.com/vX.Y/something/blah',
[
'query' => array_filter(
['limit' => 100, 'q' => 'foo', 'after' => $after]
]);
return json_decode($response->getBody()->getContents());
}
So :Ā
- My consumer (whatever is calling ‘get_things_in_chunks’) needs to know how paging is specified in the result returned back.
- My consumer needs to be able to figure out if it’s possible for there to be more results from the web service, in order to know when to make a new request …
So, if I want to keep iterating through the data returned until I meet some condition, my calling code could look a bit like the below :
<?php
$dt = new DateTime("1 year ago");
$after = null;
while (true) {
$data = get_things_in_chunks($after);
// find the paging key/data for a next request.
$after = $data['paging']['after'] ?? null;
// was there data returned? if not, break out of while(..)
if(empty($data['data'])) {
break;
}
foreach($data['data'] as $post) {
if (!check_created_at_before_dt($post->created_at, $dt)) {
break(2);
}
// do something
}
}
We can simplify things for the consumer by using generators – like the example below.
The nice thing about this is that the caller no longer needs to know/care about the underlying response format from the web serviceĀ – so no longer needs to read $data[‘paging’][‘after’]Ā or whether there is data in $data[‘data’].
<?php
function get_things_in_chunks()
{
$after = null;
while (true) {
$x = new \GuzzleHttp\Client(/* config */);
$response = $x->get(
'https://graph.facebook.com/vX.Y/something/blah', [
'query' => array_filter(
['limit' => 50, 'q' => 'foo', 'after' => $after]]));
$data = json_decode($response->getBody()->getContents());
if (!isset($data['data']) || empty($data['data'])) {
return;
}
$after = $data['paging']['after'] ?? null;
foreach ($data['data'] as $post) {
yield $post; // magic here!
}
}
}
Now the caller can look more like :
<?php
$dt = new DateTime("1 year ago");
foreach (get_things_in_chunks() as $post) {
if (!check_created_at_before_dt($post->created_at, $dt)) {
break;
}
// do something.
}
So – now the caller only has to care about things it should be caring about ($post) and does not need to know or care about the paging mechanism that’s in place.Ā
If multiple consumers are using the same ‘get_things_in_chunks()’ function, then there’s less repetition / usage becomes far simpler (no $after, no need to check if there are more posts etc).
Ā