Mastodon::Client Pagination

Tags:

shawnhcorey was gonna look at my code to figure how to grab (hash)tags.

Mastodon::Client does implement the search part of the Mastodon API. Kinda. I haven't dug too deeply into that. I don't think the tags api is directly implemented but the search method can stand in.

This post is not about that, it's about how to do pagination.

I'll do this in chunks.

This is just loading modules I'll be using. I am assuming you are familiar with Perl and what is going on here. Some are needed to make Mastodon::Client happy.

#!/usr/bin/env perl

use Mastodon::Client;

use v5.38;

use Mastodon::Client;
use YAML qw(LoadFile);
use JSON::MaybeXS;
use Type::Params qw( compile validate );
use HTTP::Response;

Now the subroutine to get bookmarks.

Covered in the previous blog post about why

sub Mastodon::Client::get_bookmarks {
  my $self = shift;
  my ($params) = @_;

  my $endpoint = q(/bookmarks);

  return $self->get($endpoint, $params);
}

Now for the big check, most of which doesn't relate to pagination. I've removed a bunch of code relating to parsing bookmark info. Look in the Mastodon API about bookmarks to see the data that comes back in the response.

sub parse_bookmarks ($bookmarks, $client) {
  my $links = $client->latest_response->{_headers}{link};

  my $pagination;

  RAW : for my $raw ( split(/,/, $links ) ) {
    my ($url_raw, $rel) = split(/;/, $raw);
    $url_raw =~ s/[<>]//g;
    my ($direction) = $rel =~ m/rel="(\w+)"/;
    my ($url, $params_returned) = split(/\?/, $url_raw);

    say qq(  direction : $direction);
    PARAM : for my $param ( split/&/, $params_returned) {
      my ($name, $value) = split(/=/, $param);
      $pagination->{$direction}{$name} = $value;
      say qq(    name      : $name);
      say qq(    value     : $value);
    } # PARAM
  } # RAW

  return($pagination);
};

Credentials and such are stored in a config file, YAML is my current format of choice.

You will need to read the API docs to create an application

My YAML file looks like:

instance: INSTANCE_URL
name: REGISTERED_APP_NAME
user_id_number: #####
key: KEY
secret: SECRET
token: TOKEN

Now just let the program know where to file the file and then load it.

my $config_file = q(./gmb-masto-reader.yml);
my $config = LoadFile($config_file);

Create the client and give it the parameters it needs.

my $client = Mastodon::Client->new(
  instance        => $config->{instance},
  name            => $config->{name},
  client_id       => $config->{key},
  client_secret   => $config->{secret},
  access_token    => $config->{token},
);

Get some bookmarks.

my $bookmarks = $client->get_bookmarks(
  {
    limit => 40,
  }
);

Do a call to parsing the bookmarks or whatever to kick off the stepping through pagination.

With max_id and min_id you get older things with max_id and new things with min_id (I think, max_id gives me what I want.)

my ($pagination) = parse_bookmarks($bookmarks, $client);

say qq(  pagination_next_max_id : $pagination->{next}{max_id});

Now you can loop until the well runs dry or do a for loop and just do a couple of pages.

while ( $pagination->{next}{max_id} > 0  ) {
  $bookmarks = $client->get_bookmarks( $pagination->{next}, $client );
  ($pagination) = parse_bookmarks($bookmarks, $client);

  say q();
  sleep(1);
}

And that's basically it.

Here's the full thing in its ugly glory, including stuff I took out.

#!/usr/bin/env perl

use Mastodon::Client;

use v5.38;

use Mastodon::Client;
use YAML qw(LoadFile);
use Data::Printer;
use JSON::MaybeXS;
use Type::Params qw( compile validate );
use HTTP::Response;

sub Mastodon::Client::get_bookmarks {
  my $self = shift;
  my ($params) = @_;

  my $endpoint = q(/bookmarks);

  return $self->get($endpoint, $params);
}
###
sub parse_bookmarks ($bookmarks, $client) {
  say qq(bookmarked :  $bookmarks->[0]->{bookmarked});
  my $links = $client->latest_response->{_headers}{link};

  my $data = decode_json($client->latest_response->{_content});


  BOOKMARK_DATA : for my $bookmark_data ( $data->@* ) {
    my $id         = $bookmark_data->{id};
    my $created_at = $bookmark_data->{'created_at'};
    my $url        = $bookmark_data->{url};
    my $content    = $bookmark_data->{content};

    say qq( id : $id);
    say qq( url : $url);
    say qq( created_at : $created_at);
    say qq( content : $content);
  } # BOOKMARK_DATA

  say qq(\n\nlinks : $links);

  my $pagination;

  RAW : for my $raw ( split(/,/, $links ) ) {
    my ($url_raw, $rel) = split(/;/, $raw);
    $url_raw =~ s/[<>]//g;
    my ($direction) = $rel =~ m/rel="(\w+)"/;
    my ($url, $params_returned) = split(/\?/, $url_raw);

    say qq(  direction : $direction);
    PARAM : for my $param ( split/&/, $params_returned) {
      my ($name, $value) = split(/=/, $param);
      $pagination->{$direction}{$name} = $value;
      say qq(    name      : $name);
      say qq(    value     : $value);
    } # PARAM
  } # RAW

  return($pagination);
};

my $config_file = q(./gmb-masto-reader.yml);
my $config = LoadFile($config_file);

my $client = Mastodon::Client->new(
  instance        => $config->{instance},
  name            => $config->{name},
  client_id       => $config->{key},
  client_secret   => $config->{secret},
  access_token    => $config->{token},
);
my $bookmarks = $client->get_bookmarks(
  {
    limit => 40,
  }
  );

my ($pagination) = parse_bookmarks($bookmarks, $client);

say qq(  pagination_next_max_id : $pagination->{next}{max_id});

while ( $pagination->{next}{max_id} > 0  ) {
  $bookmarks = $client->get_bookmarks( $pagination->{next}, $client );
  ($pagination) = parse_bookmarks($bookmarks, $client);

  say q();
  sleep(1);
}

Getting Mastodon Bookmarks

Tags:

Some time ago I created an app to grab my toots using the Perl module Mastodon::Client for this.

Lately I wanted to grab bookmarks with the idea to be able to search through them more easily.

There's just one problem. Mastodon::Client doesn't have a method to get bookmarks.

So I created a very crude one:

sub Mastodon::Client::get_bookmarks {
  my $self = shift;
  my ($params) = @_;

  my $endpoint = q(/bookmarks);

  return $self->get($endpoint, $params);
}

It needs more to really fit in with the rest of the module.

A very crude example/usage for a very crude methods

my $bookmarks = $client->get_bookmarks(
  {
    limit => 40,
  }
);

I really should find some time to figure out how to implement properly, make tests, etc., and then put in a PR.

Some day.

EDIT: Shout out to hisdeedsaredust for catching that I messed up the Markdown syntax for the href and text. That's what I get for blogging while tired.

Trans Rights Are Human Rights

Tags:

Why am I writing this now? Honestly because a recent episode of Binging with Babish on youtube opened with him noting the sponsor of the episode is Warner Brothers Hogwarts Legacy video game. I stopped the video the second I saw it. Then I unsubscribed from the channel. I later left a comment.

While this do anything to his money making machine? Nope, not in the slightest. He has over 10 million subscribers.

Trans people are under attack in the USA and around the world.

I don't need them to be friends for me to support them and worry about them. They are one of the most marginalized and persecuted groups.

I don't support the GOP laws being proposed and passed to discriminate against them.

Transsexuals are the first groups that fascists target. Their existence is labeled deviant. Their participation in sports is labeled as "harming" cisgendered girls and women by having an unfair advantage. That is a ludicrous charge.

Trans people just want to live their lives. They aren't seeking an imaginary advantage. That supposed advantage doesn't exist. It's used to persecute trans people and oppress girls and women.

GOP politicians have called for the death of trans people and anyone that gives them the healthcare they need.

This post started because a youtuber took money from a Hogwarts intellectual property. JK Rowling is transphobe. She uses her money to spread hate. Even her pen name Robert Galbraith is most likely in homage to Robert Galbraith Heath, the creator of gay conversion therapy. The coincidence is rather striking.

I refuse to participate in any activity that supports transphobia or transphobes.

Trans rights are human rights.

Trans people have the right to live their life free of fear and persecution.

the day the world changed

Tags:

The world can change on you without you noticing.

So, some time after December 10, 2022 9:19pm Eastern the "default" website changed from cdli.ucla.edu to cdli.mpiwg-berlin.mpg.de.

That's the Max Planck Institute for the History of Science in Berlin, Germany.

I emailed the support link to confirm my suspicious that this was a permanent redirect.

A Professor of Assyriology from University of Oxford / Wolfson College replied:

I understand that you and many other users had grown used to the old Framework. However, it could not be sustained any longer and we have migrated all data this Summer to the new Framework. If you explore search settings, advanced settings, and create a profile on the new framework you will be able to mimic very closely the experience of the old CDLI. Please do write if you have trouble finding certain texts or words etc.

So, my little art project world changed just like that.

I was used to the search interface and layout of the old site. I didn't relish updating my web-scraping code to compensate for the changes.

If I wouldn't my silly, little project to continue I had maybe a week to sort it out. That's how many days I had prepared.

Dear reader if you know me you know that I'm lazy and procrastinator. If you didn't before you do now. I did poke around the site some. Honestly, I was annoyed. You need javascript to do some of the more fancy stuff. Thankfully I really don't need to do fancy stuff.

The top level/landing page has a "simple" search text entry. I poked around, kept notes on the url used in the search, and the results pages. I also did lots of "view source" and "inspect element" stuff as one does.

I haven't done much "complex" searching in the past but I can adjust if I have to in the future. That's a "future gizmo" problem, and good luck to them.

I spent some time look at the toggle-able elements of an object page and figuring how to scrape that. I didn't really do much fiddling because I noticed an "export artifact" feature.

I don't recall the old framework having an "export" feature for an object. You can export into a variety of formats based on categories:

  • Metadata/catalogue
    • Flat catalogue
      • CSV
      • TSV
    • Expanded catalogue
      • JSON
    • Linked catalogue
      • TTL
      • JSON-LD
      • RDF/JSON
      • RDF/XML
  • Chemical Data
    • Seal Chemistry
      • ATF
      • JTF
  • Text/annotations
    • Text data
    • Linked annotations
      • TTL
      • JSON-LD
      • RDF/JSON
      • RDF/XML
  • Related publications
    • CSV
    • BibTex

Some of those are familiar to me and some most definitely aren't.

I looked at the JSON option. It also has a nicely consistent URL
https://cdli.mpiwg-berlin.mpg.de/artifacts/ARTIFACT_NUMBER/json

ARTIFACT_NUMBER is the CDLI number with the leading P removed along with any leading zeroes.

Looking at the data returned and it contained a lot of things I generally care about:

  • material
  • genre
  • subgenre, which is now "comments" under the genre key
  • object type, which is now artifact_type

The really best part is that any text that has been recorded for the artifact is in that data. Making it easy to look for English translations.

So, the change of framework was a decently sized disruption of my workflow to look for tablets to post. It even shortened it a bit since I create a file that has most of the information I review before adding a table to my list of tablets to post.

A net win even if it did take me some time to figure things out. I guess it keeps my brain occupied and malleable to change.

Honest Conversations at Work

Tags:

Some dark thoughts regarding being a worker in the USA.

Because of the labor laws in a majority of states...

Because of how healthcare is structured in the USA...

Because you might be over 30 years old in IT...

Because you might be a woman...

Because you might be a minority...

Because you might be LGBTQ...

Because you might have a mortgage...

Because you might have a student loan debt...

Because you might have debt...

I don't feel like you can have an honest conversation with any boss.

You have too much to lose

"Right to work states", better to call it employment-at-will

https://www.nrtw.org/right-to-work-states/

https://worldpopulationreview.com/state-rankings/right-to-work-states

Employment by state, December 2022

https://www.bls.gov/charts/state-employment-and-unemployment/employment-by-state-bar.htm

Total employment, January 2023

Household Data, Summary Table A

https://www.bls.gov/news.release/pdf/empsit.pdf

41.97% of employees are in an employment-at-will state as of December 2022/January 2023

10.1% of workers are in a union

https://www.bls.gov/news.release/union2.nr0.htm