User:Claus chr/DPL/Test

< User:Claus chr‎ | DPL
Revision as of 16:10, 20 September 2011 by Neverendingo (talk | contribs) (Testing of LinksTo template)
Jump to: navigation, search

Testing of LinksTo template

<DPL>
  namespace=Main |User
  nottitleregexp = .*(/..(-..)?|_[(].*[)])$
  include = *
  includematch = #\[\[[Ss]pecial\:[mM]y[lL]anguage/Getting[_ ]Help|\[\[Getting[_ ]Help#
  includemaxlength = 0
  resultsheader = The pages in the translation system linking to Getting Help are:\n
  format = ,\n* [[%PAGE%|%TITLE%]],,
</DPL>

Extension:DynamicPageList (DPL), version 2.3.0 : ⧼dpl_log_1⧽

Somehow this is broken. Without the namespace clause we get

The pages in the translation system linking to Getting Help are:

which excludes some pages in the main namespace(!) and includes on in the User namespace!? Explicitely specifying namespace main gives this.

Extension:DynamicPageList (DPL), version 2.3.0 : ⧼dpl_log_1⧽

That seems reasonable, but weren't main supposed to be used by default? Finally, specifying namespace User gives us all three user pages:

The pages in the translation system linking to Getting Help are:

All English pages linking to a given page (template version)

The lesson here seems to be that, at least when include is involved, we can't rely on DPL handling more than one namespace at a time. This calls for a template: {{LinksTo|Getting[_ ]Help}} gives

There are 265 pages beginning with A-J

There are 324 pages beginning with KA-KZ

There are 287 pages beginning with Ka-Kc or Ke-Kz

There are 307 pages beginning with Kd

There are 404 pages beginning with L-Z


So, to recap, the problem is this: We would like to find all pages that links to a given page (the target). The What links here wiki page does not work well with links adapted to the translation extension, i.e. links of the form [[Special:myLanguage/target page]], which means that almost none of our links would be found.

Instead we have to use DPL to find those pages, but even here we have to be careful. The obvious search using the linksto clause doesn't work either, probably for the same reason that What Links Here failed. The solution is to search the content of every page for the occurrence of a link to the target page. The problem here is, that DPL has to write the entire content of a page into the page where the search occurs, then search the text, then filter out the text again (that's the includemaxlength=0 part) and leave only links to the pages that contain a link to the target page.

For some reason, these kinds of query cause some confusion to DPL. Specifying no namespace should result in the main namespace beeing searched, but as we saw above, not every matching page in main was found, and oddly one User: page was found. Specifying both namespaces in one search seems to work better, but still misses a User: page. The solution seems to be one search for each namespace: that seems to find everything with one exception: The start page Welcome to KDE UserBase seems to be outside of namespaces, and is never found. Let's hope it is one of a kind.

This method of searching has been implemented as a template {{LinksTo|target page}} (which can easily be modified to include more namespaces in the search). Just to be clear: We are searching for the actual occurence of a link in the texts of pages. The search finds both Special:myLanguage links and old-style links. It tries to take every known variation into account (Special:, or special:, and mylanguage, Mylanguage, myLanguage, or MyLanguage). One variation we have to deal with by hand is, that spaces can be written either as a normal space chracter or as an underscore. Therefore we should either make to searches if the target name contains a space, or we could write [_ ] wherever a space occurs in the name, as in {{LinksTo|Getting[_ ]Help}}. Also note, that case is significant: {{LinksTo|getting[_ ]help}} yields

There are 265 pages beginning with A-J

There are 324 pages beginning with KA-KZ

There are 287 pages beginning with Ka-Kc or Ke-Kz

There are 307 pages beginning with Kd

There are 404 pages beginning with L-Z



We just get two error messages (one for each search), since DPL gives an error message whenever a text search finds no match on any page.

The template

The code of the first half of the template is here:

{{#dpl:
| namespace = Main
| nottitleregexp = .*(/..(-..)?{{!}}_[(].*[)])$
| include = *
| includematch = @\[\[([Ss]pecial\:[mM]y[lL]anguage/)?{{{1}}}(\]{{!}}\{{!}}{{!}}#)@
| includemaxlength = 0
| format = ,\n* [[%PAGE%|%TITLE%]],,
}}

The second half is the same, except that | namespace = Main is replaced by | namespace = User.

I couldn't make normal DPL tags work in the template, but fortunately the {{#dpl parser function does work. To add more namespaces just add a copy of the first half of the template to then end, and in the copy replace | namespace = Main by | namespace = Whatever. From the DPL documentation you might think that to search the Main namespace, you could leave out the namespace caluse altogether: That is not a good idea — in this context, it gets DPL confused so that it misses pages!

The nottitleregexp clause filters out any page whoes path ends in /xx, /xx-xx, or _(x..), ie. it filters out all translated pages, both old and new. Since the pipe character has special meaning in a template, it has to be entered as {{!}}.

The includematch clause is a perl regexpr, that matches text of the form [[Special:myLanguage/page path followed by either a ']', a '|', or a '#' character. We take into account that the S, the M, and the L of Special:MyLanguage are sometimes capitalized, and sometimes not.

The includemaxlength = 0 is to prevent DPL from entering (parts of) the content of the matched pages into the page containing the query.

Testing the template

{{LinksTo|User:Claus[_ ]chr}}There are 265 pages beginning with A-J

There are 324 pages beginning with KA-KZ

There are 287 pages beginning with Ka-Kc or Ke-Kz

There are 307 pages beginning with Kd

There are 404 pages beginning with L-Z

Flag-red.png
Remember
This page is for UserBase translations. TechBase translators list can be found here. For interface and documentation translations, please refer to KDE Localization portal.


..→


Problem nr. 1. We also find links to subpages of the target (doh!), but that should be simple to fix. (The error message just means that there were no linking pages in the Main namespace.)

{{LinksTo|Amarok}}There are 265 pages beginning with A-J

There are 324 pages beginning with KA-KZ

There are 287 pages beginning with Ka-Kc or Ke-Kz

..→

There are 307 pages beginning with Kd

There are 404 pages beginning with L-Z


Yes! There must be thousands of pages linking to some subpage of Amarok. They are obviously not found. Now can I find this link to Talk:Translation Workflow?

{{TestLinksTo|Talk:Translation[_ ]Workflow}}

Extension:DynamicPageList (DPL), version 2.3.0 : ⧼dpl_log_16⧽


This page is not found, but it is found if the same query is performed in another page! I guess, I should have expected that, given the way these queries are performed. Otherwise the page would have to include itself, which could lead to problems.

{{LinksTo|Amarok/Manual}}There are 265 pages beginning with A-J

There are 324 pages beginning with KA-KZ

There are 287 pages beginning with Ka-Kc or Ke-Kz

There are 307 pages beginning with Kd

There are 404 pages beginning with L-Z


So we can find subpages — good thing too!

Addendum

  • 30. June 2011: Added search of the Talk namespace plus a line of text to make the template document its use.
  • 18. July 2011: It seems this doesn't work! We do not find all the pages containing a given link. It seems that some resource is exceeded, and that only part of the pages are actually searched:

Consider the following two searches:

<DPL>
  nottitlematch = %/__
  namespace = Main
  include = *
  includematch = /[Ss]eamless/
  includemaxlength = 0
  resultsheader = Nontranslated pages containing the string "seamless"
  format = ,\n* [[%PAGE%|%TITLE%]]\n,,
</DPL>

Extension:DynamicPageList (DPL), version 2.3.0 : ⧼dpl_log_1⧽

<DPL>
  nottitlematch = %/__|%/__-__|% (%|%/___
  namespace = Main
  include = *
  includematch = /[Ss]eamless/
  includemaxlength = 0
  resultsheader = Nontranslated pages containing the string "seamless"
  format = ,\n* [[%PAGE%|%TITLE%]]\n,,
</DPL>

Extension:DynamicPageList (DPL), version 2.3.0 : ⧼dpl_log_1⧽

The first search excludes most translated pages, namely those ending on /la with la a two letter language code; the second search is identical except that it excludes all translated pages including remaining old style translations. You would expect the latter search to find fewer matches not more since it searches among fewer pages. On the other hand, if some capacity is exceeded by the search then the first search would run out of space first and so find fewer hits.


Content is available under Creative Commons License SA 4.0 unless otherwise noted.