(The template)
(Searching for pages containing a certain text string)
Line 15: Line 15:
 
   namespace = Translations
 
   namespace = Translations
 
   include = *
 
   include = *
   includematch = /albummet/
+
   includematch = /[Aa]pplikation/
   resultsheader = Danish translation units containing the string "albummet"
+
  includemaxlength = 0
 +
   resultsheader = Danish translation units containing the string "applikation"
 
   format = ,\n* [[%PAGE%|%TITLE%]]\n,,
 
   format = ,\n* [[%PAGE%|%TITLE%]]\n,,
 
</DPL>
 
</DPL>
Line 26: Line 27:
 
   namespace = Translations
 
   namespace = Translations
 
   include = *
 
   include = *
   includematch = /albummet/
+
   includematch = /.*[Aa]pplikation/
   resultsheader = Danish translation units containing the string "albummet"
+
  includemaxlength = 0
 +
   resultsheader = Danish translation units containing the string "applikation"
 
   format = ,\n* [[%PAGE%|%TITLE%]]\n,,
 
   format = ,\n* [[%PAGE%|%TITLE%]]\n,,
 
</DPL>
 
</DPL>

Revision as of 07:42, 28 June 2011

Reference: DPL Manual
See also Pipesmoker's notes and this page of examples
Example UI on this Template:Catlist page


Searching for pages containing a certain text string

Matching content in pages: You need to include the contents of pages in this page (include = * does that) and then do a perl-like regexp on their contents to filter interesting pages (includematch = ...). If you are searching in translated pages (fx all Danish pages) it is often advantageous to have namespace = Translations set; otherwise you will get both all full pages and all translation units containing matching text — that could be a very long output.

<DPL>
  titlematch = %/da
  namespace = Translations
  include = *
  includematch = /[Aa]pplikation/
  includemaxlength = 0
  resultsheader = Danish translation units containing the string "applikation"
  format = ,\n* [[%PAGE%|%TITLE%]]\n,,
</DPL>


All English pages linking to a given page

Note-box-icon.png
Note
The template is under development. If this page behaves strangely, it is probably one of my experiments gone haywire. Normally, these fits should only last for a couple of minutes, but they may recur. So be warned!


<DPL>
  namespace=Main |User
  nottitleregexp = .*(/..(-..)?|_[(].*[)])$
  include = *
  includematch = #\[\[[Ss]pecial\:[mM]y[lL]anguage/Getting[_ ]Help|\[\[Getting[_ ]Help#
  includemaxlength = 0
  resultsheader = The pages in the translation system linking to Getting Help are:\n
  format = ,\n* [[%PAGE%|%TITLE%]],,
</DPL>

Extension:DynamicPageList (DPL), version 2.3.0 : ⧼dpl_log_1⧽

Somehow this is broken. Without the namespace clause we get

The pages in the translation system linking to Getting Help are:

which excludes some pages in the main namespace(!) and includes on in the User namespace!? Explicitely specifying namespace main gives this.

Extension:DynamicPageList (DPL), version 2.3.0 : ⧼dpl_log_1⧽

That seems reasonable, but weren't main supposed to be used by default? Finally, specifying namespace User gives us all three user pages:

The pages in the translation system linking to Getting Help are:

All English pages linking to a given page (template version)

The lesson here seems to be that, at least when include is involved, we can't rely on DPL handling more than one namespace at a time. This calls for a template: {{LinksTo|Getting[_ ]Help}} gives

There are 266 pages beginning with A-J

There are 325 pages beginning with KA-KZ

There are 288 pages beginning with Ka-Kc or Ke-Kz

There are 309 pages beginning with Kd

There are 409 pages beginning with L-Z


So, to recap, the problem is this: We would like to find all pages that links to a given page (the target). The What links here wiki page does not work well with links adapted to the translation extension, i.e. links of the form [[Special:myLanguage/target page]], which means that almost none of our links would be found.

Instead we have to use DPL to find those pages, but even here we have to be careful. The obvious search using the linksto clause doesn't work either, probably for the same reason that What Links Here failed. The solution is to search the content of every page for the occurrence of a link to the target page. The problem here is, that DPL has to write the entire content of a page into the page where the search occurs, then search the text, then filter out the text again (that's the includemaxlength=0 part) and leave only links to the pages that contain a link to the target page.

For some reason, these kinds of query cause some confusion to DPL. Specifying no namespace should result in the main namespace beeing searched, but as we saw above, not every matching page in main was found, and oddly one User: page was found. Specifying both namespaces in one search seems to work better, but still misses a User: page. The solution seems to be one search for each namespace: that seems to find everything with one exception: The start page Welcome to KDE UserBase seems to be outside of namespaces, and is never found. Let's hope it is one of a kind.

This method of searching has been implemented as a template {{LinksTo|target page}} (which can easily be modified to include more namespaces in the search). Just to be clear: We are searching for the actual occurence of a link in the texts of pages. The search finds both Special:myLanguage links and old-style links. It tries to take every known variation into account (Special:, or special:, and mylanguage, Mylanguage, myLanguage, or MyLanguage). One variation we have to deal with by hand is, that spaces can be written either as a normal space chracter or as an underscore. Therefore we should either make to searches if the target name contains a space, or we could write [_ ] wherever a space occurs in the name, as in {{LinksTo|Getting[_ ]Help}}. Also note, that case is significant: {{LinksTo|getting[_ ]help}} yields

There are 266 pages beginning with A-J

There are 325 pages beginning with KA-KZ

There are 288 pages beginning with Ka-Kc or Ke-Kz

There are 309 pages beginning with Kd

There are 409 pages beginning with L-Z



We just get two error messages (one for each search), since DPL gives an error message whenever a text search finds no match on any page.

The template

The code of the first half of the template is here:

{{#dpl:
| namespace = Main
| nottitleregexp = .*(/..(-..)?{{!}}_[(].*[)])$
| include = *
| includematch = @\[\[([Ss]pecial\:[mM]y[lL]anguage/)?{{{1}}}(\]{{!}}\{{!}}{{!}}#)@
| includemaxlength = 0
| format = ,\n* [[%PAGE%|%TITLE%]],,
}}

The second half is the same, except that | namespace = Main is replaced by | namespace = User.

I couldn't make normal DPL tags work in the template, but fortunately the {{#dpl parser function does work. To add more namespaces just add a copy of the first half of the template to then end, and in the copy replace | namespace = Main by | namespace = Whatever. From the DPL documentation you might think that to search the Main namespace, you could leave out the namespace caluse altogether: That is not a good idea — in this context, it gets DPL confused so that it misses pages!

The nottitleregexp clause filters out any page whoes path ends in /xx, /xx-xx, or _(x..), ie. it filters out all translated pages, both old and new. Since the pipe character has special meaning in a template, it has to be entered as {{!}}.

The includematch clause is a perl regexpr, that matches text of the form [[Special:myLanguage/page path followed by either a ']', a '|', or a '#' character. We take into account that the S, the M, and the L of Special:MyLanguage are sometimes capitalized, and sometimes not.

The includemaxlength = 0 is to prevent DPL from entering (parts of) the content of the matched pages into the page containing the query.

Testing the template

{{LinksTo|User:Claus[_ ]chr}}There are 266 pages beginning with A-J

There are 325 pages beginning with KA-KZ

There are 288 pages beginning with Ka-Kc or Ke-Kz

There are 309 pages beginning with Kd

There are 409 pages beginning with L-Z

Flag-red.png
Remember
This page is for UserBase translations. TechBase translators list can be found here. For interface and documentation translations, please refer to KDE Localization portal.


..→


Problem nr. 1. We also find links to subpages of the target (doh!), but that should be simple to fix. (The error message just means that there were no linking pages in the Main namespace.)

{{LinksTo|Amarok}}There are 266 pages beginning with A-J

There are 325 pages beginning with KA-KZ

There are 288 pages beginning with Ka-Kc or Ke-Kz

..→

There are 309 pages beginning with Kd

There are 409 pages beginning with L-Z


Yes! There must be thousands of pages linking to some subpage of Amarok. They are obviously not found. Now can I find this link to Talk:Translation Workflow?

{{TestLinksTo|Talk:Translation[_ ]Workflow}}


This page is not found, but it is found if the same query is performed in another page! I guess, I should have expected that, given the way these queries are performed. Otherwise the page would have to include itself, which could lead to problems.

{{LinksTo|Amarok/Manual}}There are 266 pages beginning with A-J

There are 325 pages beginning with KA-KZ

There are 288 pages beginning with Ka-Kc or Ke-Kz

There are 309 pages beginning with Kd

There are 409 pages beginning with L-Z


So we can find subpages — good thing too!

Kopete Subpages in 3 columns

<DPL>
  titlematch = Kopete/%
  notnamespace = Translations
  columns = 3
  format = ,\n* [[%PAGE%|%TITLE%]],,
</DPL>

Akonadi Subpages in Danish

<DPL>
  titlematch = Akonadi%/da
  notnamespace = Translations
  format = ,\n* [[%PAGE%|%TITLE%]],,
</DPL>

Archived pages

<DPL>
  titlematch = %
  namespace = Archive
  columns = 2
  format = ,\n* [[%PAGE%|%TITLE%]],,
  resultsheader = There are %TOTALPAGES% pages in the Archive namespace. These are:\n
</DPL>

NoIndexed pages

<DPL>
  titlematch = %
  category = Noindexed_pages
  columns = 2
  format = ,\n* [[%PAGE%|%TITLE%]],,
  resultsheader = There are %TOTALPAGES% pages in the Archive namespace. These are:\n
</DPL>

Ignoring Deleted Pages

"As for DPL. If you hit a page with ?action=purge attached to the URL (i.e. http://en.wikinews.org/wiki/Template:Latest_news?action=purge ), it will dump all the removed pages."

Remaining old-style translations

<DPL>
  titlematch = %_(%)
  notcategory = Template
  notnamespace = Thread
  notnamespace = Summary
  columns = 2
  format = ,\n* [[%PAGE%|%TITLE%]],,
  resultsheader = There are %TOTALPAGES% pages (partly) remaining in old-style translations. These are:\n
</DPL>

Pages with old i18n bar

<DPL>
  titlematch = %
  namespace = Main
  uses = Template:I18n/Language Navigation Bar
  columns = 3
  format = ,\n* [[%PAGE%|%TITLE%]],,
  resultsheader = There are %TOTALPAGES% pages that still display the old i18n language bar\n
</DPL>

Pages with old i18n bar but w/o old-way-translated ones

<DPL>
  nottitlematch = %_(%)
  namespace = Main
  uses = Template:I18n/Language Navigation Bar
  columns = 3
  format = ,\n* [[%PAGE%|%TITLE%]],,
  resultsheader = There are %TOTALPAGES% relevant pages that still display the old i18n language bar\n
</DPL>

Pages not updated since 1st July 2010

<DPL>
  namespace = Main
  lastrevisionbefore = 201007010000
  columns = 2
  ordermethod=lastedit
  format = ,\n* (%DATE%) [[%PAGE%|%TITLE%]],,
  resultsheader = There are %TOTALPAGES% pages without recent updates\n
</DPL>

Listing Non-Translation Pages

<DPL>
  nottitlematch = %/__|%/zh-%|%(%)
  titlematch = Amarok%
  namespace = Main
  columns = 1
  format = ,\n* [[%PAGE%|%TITLE%]],,
  resultsheader = There are %TOTALPAGES% Amarok pages, not counting translations\n
</DPL>

List all pages in a specific namespace

<DPL>
  nottitlematch = %/__|%/zh-%|%pt-%|%(%)
  namespace = MediaWiki
  columns = 3
  format = ,\n* [[%PAGE%|%TITLE%]],,
  resultsheader = These %TOTALPAGES% pages are in the Mediawiki namespace\n
</DPL>

To count translated pages in a specific language:

<DPL>
  titlematch = %/en
  notnamespace = Translations
  columns = 3
  format = ,\n* [[%PAGE%|%TITLE%]],,
  resultsheader = There are %TOTALPAGES% pages (partly) translated to English. These are:\n
</DPL>

There are 795 pages (partly) translated to English. These are:


Content is available under Creative Commons License SA 4.0 unless otherwise noted.