Nepomuk/nl: Difference between revisions

From KDE UserBase Wiki
(Updating to match new version of source page)
(Updating to match new version of source page)
Line 32: Line 32:
Strigi does not index every file on the hard drive. Its default configuration in most Linux distributions excludes some common patterns for backup files and configuration directories, and it only indexes certain directories in your home folder. You can change this in <menuchoice>System Settings -> Desktop Search -> Desktop Query -> Customize index folders… -> Folders to index</menuchoice>.
Strigi does not index every file on the hard drive. Its default configuration in most Linux distributions excludes some common patterns for backup files and configuration directories, and it only indexes certain directories in your home folder. You can change this in <menuchoice>System Settings -> Desktop Search -> Desktop Query -> Customize index folders… -> Folders to index</menuchoice>.


Note that Strigi as of KDE 4.7 does not follow symbolic links ({{bug|208602}}), even if you select folders under the symbolic link for indexing ({{bug|287593}}) for index. (A symbolic link is a file that "points" to another file or directory; Dolphin displays symbolic links in ''italic'' .) You must find the path to the actual directory (in '''Dolphin''', select the file, choose <menuchoice>Properties -> General -> Points to</menuchoice>) and tell Strigi to index that.
Note that Strigi as of KDE 4.7 does not follow symbolic links ({{bug|208602}}). Up to KDE 4.9, even if the user selects folders under the symbolic link for indexing ({{bug|287593}}), content will not been indexed. The specific bug has been fixed. Thus, the user will be able to see the symbolic links, but she/he will not be allowed to mark any of them for indexing. (A symbolic link is a file that "points" to another file or directory; Dolphin displays symbolic links in ''italic''.) You must find the path to the actual directory (in '''Dolphin''', select the file, choose <menuchoice>Properties -> General -> Points to</menuchoice>) and tell Strigi to index that.


In '''System Settings''' you can also control whether Strigi indexes files on removable media such as USB drives and CD-ROMs.
In '''System Settings''' you can also control whether Strigi indexes files on removable media such as USB drives and CD-ROMs.

Revision as of 05:23, 10 January 2013

Nepomuk

Het doel van deze pagina is niet ieder detail van de technologie van Nepomuk duidelijk te maken, maar een kort overzicht met voorbeelden te geven en de visie erachter te delen. Daarnaast worden er verwijzingen gegeven naar relevante informatie op het web.

Korte uitleg

Zoals in de Woordenlijst wordt aangegeven, gaat Nepomuk over het classificeren, de organisatie en presentatie van gegevens. Het is geen toepassing, maar een component die door ontwikkelaars worden gebruikt binnen toepassingen.

Proberen

Dolphin is een van de applicaties die gebruik maakt van Nepomuk. Om het te laten werken moeten Nepomuk and Strigi geactiveerd worden in System settings -> Desktop search. The informatie-zijbalk van Dolphin laat u tags, waarderingen en opmerkingen aan bestanden toevoegen. Deze informatie wordt opgeslagen in Nepomuk en geïndexeerd door Strigi. U kunt dan zoeken naar metadata met de navigatiebalk in Dolphin. Schrijf "nepomuksearch:/" gevolgd door zoektermen.


Functionaliteiten

Nepomuk biedt verschillende 'lagen' van functionaliteit aan applicaties. De eerste en meest eenvoudige hiervan is handmatig taggen, waarderingen commentaar toevoegen, zoals gebruikt in Dolphin. Dit helpt u om uw bestanden sneller te vinden, maar is ook veel werk.

Om het vinden van bestanden met tekst te vergemakkelijken, biedt Nepomuk een tweede functionaliteit: het indexeren van de tekst in bestanden. Hiervoor maakt het gebruik van een technologie genaamd Strigi. U kunt nu ook bestanden te vinden door het invoeren van een aantal woorden waarvan u weet dat ze er in staan, of gewoon (een deel van) hun titel.

De derde laag is zeer complex, en de reden waarom Nepomuk werd opgevat als een onderzoeksproject van verschillende bedrijven en universiteiten in de Europese Unie. Dit is waar u moeilijke woorden als 'semantische desktop' en 'ontologieën' tegenkomt. Kort samengevat gaat het over context en relaties.

Indexing files

Strigi does not index every file on the hard drive. Its default configuration in most Linux distributions excludes some common patterns for backup files and configuration directories, and it only indexes certain directories in your home folder. You can change this in System Settings -> Desktop Search -> Desktop Query -> Customize index folders… -> Folders to index.

Note that Strigi as of KDE 4.7 does not follow symbolic links (bug #208602). Up to KDE 4.9, even if the user selects folders under the symbolic link for indexing (bug #287593), content will not been indexed. The specific bug has been fixed. Thus, the user will be able to see the symbolic links, but she/he will not be allowed to mark any of them for indexing. (A symbolic link is a file that "points" to another file or directory; Dolphin displays symbolic links in italic.) You must find the path to the actual directory (in Dolphin, select the file, choose Properties -> General -> Points to) and tell Strigi to index that.

In System Settings you can also control whether Strigi indexes files on removable media such as USB drives and CD-ROMs.

Voorbeelden

Laat me proberen uit te leggen wat Nepomuk biedt door twee voorbeelden.

Verbindingen

Stel dat u twee weken geleden een foto hebt gekregen van een vriend. U heeft het ergens op uw computer opgeslagen. Maar hoe kan het bestand nu teruggevonden worden? Als u de locatie niet meer kunt herinneren, is dat een probleem.

Daarbij kan Nepomuk u helpen. U weet dat het bestand afkomstig was van die vriend, maar uw computer weet dat niet. Nepomuk kan die relatie wel herinneren. Zoeken naar de naam van die vriend zal daardoor de foto terugvinden!

Een andere mogelijke relatie is tussen een web pagina waarvan u tekst gekopieerd hebt en het document waarin u die tekst geplakt heeft, of twee beelden die de dezelfde auto bevatten. Dergelijke relaties kunnen soms worden geëxtraheerd uit de bestanden zelf (u zou foto's kunnen analyseren en zien wie of wat er op staat) of worden geleverd door de betrokken applicaties (zoals in het bovenstaande e-mail voorbeeld). Dit deel van Nepomuk is nog steeds in ontwikkeling, en moet worden geïntegreerd in applicaties, dus u kunt verwachten dat het nog een paar jaar duurt voor het echt tot bloei komt.

All in all, this part of Nepomuk is about making search smart. Think about how Google tries to be smart with your searches: when you search for a hotel and a city name, it shows above the website results a google map showing hotels in the city you mentioned! It might even suggest a better name in case you made a spelling mistake. Google also tries to put the most relevant information on top of the list of results, using complex calculations on relationships (links) between websites. Nepomuk will be able to offer such smart results and order them on relevancy using relationship information.

Context

These relationships can not only help you while searching for files, but also have an influence on applications and what information they present. Note that this way of using Nepomuk is still more a vision than reality! Many of the components are in place, but it is not yet integrated in applications and the desktop as a whole.

So here an example of bringing context awareness to your desktop could help you work more efficiently.

Stel dat u bezig bent met het afmaken van wat notities die u tijdens een vergadering gemaakt heeft. De telefoon gaat en iemand vraagt u om die spreadsheet met prijzen te vinden en aan te passen voor een klant. Een paar onderbrekingen later heeft u uw bureaublad vol met bestanden en vensters...

Het zou leuk zijn als alles een beetje beter te ordenen is, niet?

Enter 'activities'. These have been introduced in Plasma, and currently offer different 'desktops'. They are a bit like virtual desktops, except that the desktop itself changes, not the set of applications. Different widgets, background, things like that. Of course, since KDE 4.3, each virtual desktop can have it's own activity, bringing the two in sync.

If applications and desktop were aware of activities, you could create an activity for each of the tasks you regularly work on. So if you often have to change a spreadsheet with prices, you create an activity for that: put a Folder View (or several) widget on the desktop, add a calculator and a todo-widget to keep track of what you still have to change. Maybe even an email folder widget showing the mails with questions regarding these prices spreadsheets!

As soon as somebody asks a question about prices, you switch to this activity. Fire up your spreadsheet application. It is aware of your activity so it shows recent price spreadsheets, not the recent list of inventory you were working on in another activity! Kopete, the chat application shows your colleague who knows all about prices, as she is the person you always chat with when working on this activity.

Ga wanneer u klaar bent terug naar een andere activiteit, en opnieuw passen alle applicaties hun gedrag aan op wat u aan het doen bent.

The benefits of such an activity-based work flow go further than you might at first expect. It not only helps you find files and contact persons, but also helps in switching tasks itself. The human brain isn't very good at multi-tasking - it takes most people several minutes to get up to speed after switching tasks. Changing the 'environment' helps a lot in speeding this up, even if it's just on the screen. Compare it with getting in the mood for your holiday by packing your bag!

Of course, the above is mostly relevant to people working behind their computer in the office or at home. A gamer or a casual user would probably not use these activities much.

Note that the scenario described above is still years away from reality. Much of the basic infrastructure for this in KDE is in place but much is still left.

Frequently Asked Questions

The following is taken from a KDE forums post. Please feel free to add/remove/modify details if you have the time!

Q. What is the Nepomuk Semantic Desktop, and the Strigi Desktop File Indexer?

A. The Nepomuk Semantic Desktop is the foundation of the all the other modules of the Nepomuk infrastructure. It provides a way to organize, annotate and build relationships among the data (not only file name and content, but for example which applications used a certain file, or how it is tagged). A number of KDE applications and workspaces use this basic infrastructure to deliver features such as email tagging (KMail) or activity setup (Plasma).

On the other hand, the Strigi Desktop File Indexer is a system to index files so that they can be added to the main Nepomuk repository, a convenient way to use them within Nepomuk without adding any file manually. Also, applications such as Dolphin can then search for files basing on content, name, or other meta-data (e.g. tags) associated to indexed files. Such an indexer can also index non-text files, such as PDFs, by accessing the meta-data contained in these files (author, publication information, etc.). Some KDE components ship additional "analyzers" for more file types. Nepomuk can be fully functional without the use of the File Indexer, which is an additional (and optional) component.

Q. How can I tell if Strigi has indexed a file?

A. In Dolphin, select the file. If the Information panel displays "Created at" and "Has hash", then the file was indexed by Strigi.

Q. How can I disable the semantic desktop?

A. Most of the times, the easiest way is to disable file indexing, which is usually, among the Nepomuk components, the heavier in resource usage (although many optimizations have been included in the 4.7 release which reduce resource usage). This is done by unchecking Enable Strigi Desktop File Indexer in the Desktop Search section of System Settings. In case you want to turn off all semantic features, uncheck Enable Nepomuk semantic desktop. Notice that this will turn off search in Dolphin as well.

Notice that with the latter option some programs who use Nepomuk for meta-data will offer reduced functionality: for example KMail will not be able to tag mail, or Plasma activities will not offer additional features such as icons, or program data information.

Q. Why do I have nepomukservicestub processes even though I've disabled Nepomuk?

A. It may be a bug. Please file a bug report with a complete description of your problem and the steps to trigger it.

Q. File indexing of PDF/some other file types doesn't work.

A. PDF indexing is a known issue and it's being tracked in bug #231936. If you have issues with other files, open a bug, preferably adding a sample file that shows the problem.

Q. The program nepomukservicestub crashes at startup.

A. A large number of fixes for crashes has been fixed for the 4.7.2 release of the KDE Workspaces and Applications. If you encounter more, please file bugs report with detailed instructions on how to reproduce the problem, as sometimes the developers are unable to trigger them in their test setups.

Q. The virtuoso-t process hangs at 100% CPU.

A. Virtuoso-t is a key component of the Nepomuk infrastructure and in some occasions the commands sent by the other components end up taking too much time (hence showing the effect of 100% CPU). Sebastian Trüg (the lead developer of Nepomuk) has fixed most of these problems in 4.7.1 or newer.

Q. Sometimes Nepomuk consumes too much RAM.

A. Many of these problems have been fixed, in other cases however the developers are unable to reproduce the issues correctly. In this case, providing examples and test cases to bug reports increase the chances to get these bugs fixed.

Q. Nepomuk re-indexes files at startup.

A. This bug has been fixed in 4.7.0 versions. Now Nepomuk just "scans" for changes, without indexing anything.

Q. Nepomuk accesses the disk too much on startup.

A. In 4.7 and newer this problem has been lessened thanks to a throttling mechanism implemented in the file indexer.

Q. My Nepomuk database has been corrupted. How do I clean it?

A. In the extreme case your database is really corrupted and all other attempts have failed, you can delete the $KDEHOME/share/apps/nepomuk directory (where $KDEHOME is usually .kde or .kde4) while Nepomuk is not running. The database will be cleared, but you will also lose existing information such as tags, ratings and comments.

Advanced troubleshooting

If Nepomuk has trouble indexing a file, in a terminal try running

nepomukindexer /path/to/file

and see if there's any useful output. You can compare the output to a similar file that is successfully indexed.

xmlindexer /path/to/file > /tmp/test.xml

generates an XML representation of some of the information extracted from a file. You can view this in an XML viewer, such as your browser, and again compare it to the output for a similar file. (xmlindexer may be in a different and optional package in your Linux distribution; for example in Ubuntu it is in the strigi-utils package.)

This blog post explains how to turn on debugging output from the Nepomuk service. It also shows how to debug the CPU usage of Nepomuk and its storage backend.

This blog post explains how to extract useful information for bug reports about the CPU usage of the Virtuoso backend.

Delen en privacy

There is one thing I need to touch on before pointing to other sources of information: sharing Nepomuk data. It'd be great if your tags, ratings and comments would be shared with others when you send them files. However, if you tagged a contact with a slightly embarrassing tag ('boring in bed') and send that persons contact information to a mutual friend you probably don't want that tag to be send as well...

This issue is of course being considered and an important subject of research by the Nepomuk researchers. For the time being, these privacy concerns, combined with technical challenges, are the reason Nepomuk context is private. Rest assured the Nepomuk team does all it can to make sure your privacy is respected.

External links