Amarok/Manual/Organization/CollectionScanning: Difference between revisions

From KDE UserBase Wiki
(more information about unique ids and albums)
m (Updated for 2.8)
 
(28 intermediate revisions by 8 users not shown)
Line 1: Line 1:
== Collection Scanning ==
<languages />
<translate>


Every time Amarok is displaying a collection the information about tracks and albums needs to come from a source.
==== Collection Scanning ==== <!--T:1-->
The source can be an ipod, an internet service or a database.
 
For tracks that are stored on the file system Amarok is using a database to have fast access to the required meta information.
<!--T:2-->
At first this information needs to be imported into the database which is usually done by scanning the collection directories for audio files.
Every time '''Amarok''' is displaying a collection the information about tracks and albums needs to come from a source.
The source can be an portable device, an Internet service or a database.
For tracks that are in '''Local Collection''' folders '''Amarok''' is using a database to have fast access to the required meta-information.
At first this information needs to be loaded into the database which is usually done by scanning the '''Local Collection''' directories for audio files.
This process is called collection scanning.
This process is called collection scanning.


It is useful to understand the scanning process in order to work better with Amarok.
<!--T:3-->
It is useful to understand the scanning process in order to work better with '''Amarok'''.


= Incremental Scan / Update Collection =
===== Incremental Scan / Update Collection ===== <!--T:4-->
The so called incremental scan will check the collection directories for updates.
This is usually done every minute but can also be triggered manually by selecting "Update Collection" from the menu.


<!--T:5-->
The so-called incremental scan will check the collection directories for updates.
This is done every minute if <menuchoice>Watch folders for changes</menuchoice> is enabled (on by default) but can also be triggered manually by selecting <menuchoice>Update Collection</menuchoice> from the menu.
<!--T:6-->
The incremental scan will just check the modification date of every folder in the collection against the last known modification time.
The incremental scan will just check the modification date of every folder in the collection against the last known modification time.
This has a couple of implications:
This has a couple of implications:
- You can trigger a rescan of one directory by modifying it's time (e.g. via "touch")
- If files inside a directory are changed the scanner will not notice that (as changing a file does only update it's modification time but not the time of the parent folder).
- If the collection folders are on a very slow partition the process of just checking all the modification times can take some time. Usually this information is cached by the operating system but with large collections that might not be possible. In such a case the scanner might appear to scan continuously.


With collections above 5000 tracks or when collections are stored on a network drive or an NTFS partition it is recommended to switch off the "watch folders for change" option.
<!--T:24-->
* You can trigger a rescan of one directory by modifying its time (such as using <code>touch /path/to/directory</code> in the console).


= Progress bar / scanning time =
<!--T:25-->
* If files inside a directory are changed the scanner will not notice, because changing a file updates its modification time but not the time of the parent folder. On the other hand most programs that modify the files save them atomically by using a temporary file which is then renamed. Such procedure does update directory modification time and thus triggers directory rescan.
 
<!--T:26-->
* If the collection folders are on a very slow partition the process of checking all the modification times can take some time. Usually this information is cached by the operating system but with large collections that might not be possible. In such a case the scanner might appear to scan continuously. With collections above thousands of directories or when collections are stored on a network drive or an NTFS partition it is recommended to switch off the <menuchoice>Watch folders for change</menuchoice> option.
 
<!--T:7-->
If you have problems with deleted tracks still appearing in the collection, or you want to update album covers (which are not updated by a mere <menuchoice>Update collection</menuchoice>) then you can use the <menuchoice>Full rescan</menuchoice> option in the settings dialog.
<menuchoice>Full rescan</menuchoice> will not care about modification dates. It will not delete statistics of existing files; it will, however, delete statistics of tracks that disappeared from the currently mounted collection folders. Because of that it is adviseable to perform the <menuchoice>Full rescan</menuchoice> only with all '''Local Collection''' folders mounted if you move the tracks between mounts. <menuchoice>Full rescan</menuchoice> also updates play count if the one stored in file tags is greater, rating if the song is unrated and file tags mention rating and score (under the same circumstances).
 
===== Progress bar / scanning time ===== <!--T:8-->
 
<!--T:9-->
The progress bar will show the progress of the scanning.
The progress bar will show the progress of the scanning.
Up to 50% the scanner will scan the file system and just buffer the result.
Up to 50% the scanner will scan the file system and just buffer the result.
Times above 50% indicate that the scanner is committing the results to the database.
Times above 50% indicate that the scanner is committing the results to the database.
Usually the second step is much faster than the first so don't be surprised if the progress bar seems to jump.
Usually the second step is much faster than the first so don't be surprised if the progress bar seems to jump.
Up to 50% aborting the scan is possible. After 50% the commiting of the files can not be stopped.
Up to 50% aborting the scan is possible. After 50%, the committing of the files cannot be stopped.


The scanning time depend on your disk speed and other factors.
<!--T:10-->
The scanning time depends on your disk speed and other factors.
Usually the first scan is a lot slower than subsequent scans where the files are cached by the operating system.
Usually the first scan is a lot slower than subsequent scans where the files are cached by the operating system.
A scan of 10000 files should take around three minutes on a modern computer. 50000 files should be around 13 minutes.
A scan of 10000 files should take around three minutes on a modern computer. 50000 files should be around 13 minutes. Of course with an SSD (solid state drive) this will be much faster.
 
===== Backup of collection ===== <!--T:11-->
 
<!--T:12-->
With the default settings '''Amarok''' is storing all the collection information in a directory called <tt>~.kde/share/apps/amarok/mysqle/</tt> . It can be a good idea to make a backup of this directory from time to time, especially when you didn't enable the writing back of statistics information.
 
===== About unique ids ===== <!--T:13-->


= Backup of collection =
<!--T:14-->
With the default settings Amarok is storing all the collection information in a directory called ~.kde/share/apps/amarok/mysqle/
'''Amarok''' is tracking files by an id that is either stored in the audio track or computed using file metadata, tag metadata and first few kilobytes of the file.
It can be a good idea to make a backup of this directory from time to time, especially when you didn't enable the writing back of statistics information.
This id helps '''Amarok''' to identify tracks that are moved to other locations so that statistics information (rating, score, playcount, first & last played) are not lost.
Currently '''Amarok''' will not import tracks with duplicate unique ids.
This leads to the surprising behavior that copied tracks still appear only once in '''Amarok'''.


= About unique ids =
<!--T:15-->
Amarok is tracking files by an id that is either stored in the audio track or computed by the artist, album title and track title meta information.
In some circumstances even different tracks can end up with the same unique id. Such a problem can be seen by the debug output (start '''Amarok''' with the --debug option in a console) while scanning.
This id helps Amarok to identify tracks that are moved to other locations so that statistics informations (rating, score, playcount) are not lost.
Currently Amarok will not import tracks with duplicate unique ids.
This leads to the surprising behavior that copied tracks still appear only once in Amarok.


In some circumstances even different tracks can end up with the same unique id. That is very uncommon and usually the result of running years of unstable Amarok releases.
===== About Albums ===== <!--T:17-->
Such a problem can be seen by the debug output (start Amarok with the --debug option) while scanning.


= About Albums =
<!--T:18-->
The scanner can only read single tracks but Amarok will display those sorted by album and compilation (an album without one specific artist).
The scanner can only read single tracks but '''Amarok''' will display those sorted by album and compilation (an album without one specific artist). '''Amarok''' can't rely on the directory in which the files are located, since directory organizational schemes vary so widely.
Amarok can't rely on the directory the files are located as several applications will sort tracks by artist and not album.


<!--T:19-->
The scanner is therefore doing the following:
The scanner is therefore doing the following:
- Tracks without an album artist or an artist (or a composer in case of a classical track) are placed in a compilation.
- Tracks that have the compilation flag set or an album artist other than "various artists" will be placed in an album.
- Tracks that have the compilation flag set to 0 are placed in a compilation.
- Albums called "Live", "Greatest Hits" and a couple of other names are always regarded as an album.
- If we end up having tracks with several different artists left over they are placed inside a compilation, else we make one album out of them.


This process is quite complicated and even I was sometimes surprised by the results.
<!--T:27-->
However usually the outputs of the scanner can help figuring out why the tracks are sorted as they are.
* Tracks without an album artist or an artist (or a composer in case of a classical track) are placed in a compilation.
In such a case try executing (on a command line) "amarokcollectionscanner -r ~/Music/directory"
 
<!--T:28-->
* Tracks that have the compilation flag set or an album artist other than "various artists" will be placed in an album.
 
<!--T:29-->
* Tracks that have the compilation flag set to 0 are placed in a compilation.
 
<!--T:30-->
* Albums called "Best Of", "Anthology", "Hit collection", "Greatest Hits", "All Time Greatest Hits" and "Live" are always regarded as an album.
 
<!--T:31-->
* If we end up having tracks with several different artists left over they are placed inside a compilation, or else we make one album out of them.
 
<!--T:20-->
This process is quite complicated. However usually the outputs of the scanner can help in figuring out why the tracks are sorted as they are.
In such a case try executing (on a command line) {{Input|amarokcollectionscanner -r '''''~/your/music/directory'''''}}
Look for "compilation" tags and tracks with different "artist" and "albumartist" tags.
Look for "compilation" tags and tracks with different "artist" and "albumartist" tags.
<!--T:21-->
You can remove the ''compilation'' tag from mp3 files with the following command: {{Input|id3v2 -r TCMP '''''your filename here'''''}}
<!--T:22-->
{{Prevnext2
| prevpage=Special:myLanguage/Amarok/Manual/Organization/Collection | nextpage=Special:myLanguage/Amarok/Manual/Organization/Collection/SearchInCollection
| prevtext=Collection | nexttext=Search in Collection
| index=Special:myLanguage/Amarok/Manual | indextext=Back to Menu
}}
<!--T:23-->
[[Category:Amarok2.8]]
[[Category:Multimedia]]
[[Category:Tutorials]]
</translate>

Latest revision as of 15:39, 1 July 2013

Collection Scanning

Every time Amarok is displaying a collection the information about tracks and albums needs to come from a source. The source can be an portable device, an Internet service or a database. For tracks that are in Local Collection folders Amarok is using a database to have fast access to the required meta-information. At first this information needs to be loaded into the database which is usually done by scanning the Local Collection directories for audio files. This process is called collection scanning.

It is useful to understand the scanning process in order to work better with Amarok.

Incremental Scan / Update Collection

The so-called incremental scan will check the collection directories for updates. This is done every minute if Watch folders for changes is enabled (on by default) but can also be triggered manually by selecting Update Collection from the menu.

The incremental scan will just check the modification date of every folder in the collection against the last known modification time. This has a couple of implications:

  • You can trigger a rescan of one directory by modifying its time (such as using touch /path/to/directory in the console).
  • If files inside a directory are changed the scanner will not notice, because changing a file updates its modification time but not the time of the parent folder. On the other hand most programs that modify the files save them atomically by using a temporary file which is then renamed. Such procedure does update directory modification time and thus triggers directory rescan.
  • If the collection folders are on a very slow partition the process of checking all the modification times can take some time. Usually this information is cached by the operating system but with large collections that might not be possible. In such a case the scanner might appear to scan continuously. With collections above thousands of directories or when collections are stored on a network drive or an NTFS partition it is recommended to switch off the Watch folders for change option.

If you have problems with deleted tracks still appearing in the collection, or you want to update album covers (which are not updated by a mere Update collection) then you can use the Full rescan option in the settings dialog. Full rescan will not care about modification dates. It will not delete statistics of existing files; it will, however, delete statistics of tracks that disappeared from the currently mounted collection folders. Because of that it is adviseable to perform the Full rescan only with all Local Collection folders mounted if you move the tracks between mounts. Full rescan also updates play count if the one stored in file tags is greater, rating if the song is unrated and file tags mention rating and score (under the same circumstances).

Progress bar / scanning time

The progress bar will show the progress of the scanning. Up to 50% the scanner will scan the file system and just buffer the result. Times above 50% indicate that the scanner is committing the results to the database. Usually the second step is much faster than the first so don't be surprised if the progress bar seems to jump. Up to 50% aborting the scan is possible. After 50%, the committing of the files cannot be stopped.

The scanning time depends on your disk speed and other factors. Usually the first scan is a lot slower than subsequent scans where the files are cached by the operating system. A scan of 10000 files should take around three minutes on a modern computer. 50000 files should be around 13 minutes. Of course with an SSD (solid state drive) this will be much faster.

Backup of collection

With the default settings Amarok is storing all the collection information in a directory called ~.kde/share/apps/amarok/mysqle/ . It can be a good idea to make a backup of this directory from time to time, especially when you didn't enable the writing back of statistics information.

About unique ids

Amarok is tracking files by an id that is either stored in the audio track or computed using file metadata, tag metadata and first few kilobytes of the file. This id helps Amarok to identify tracks that are moved to other locations so that statistics information (rating, score, playcount, first & last played) are not lost. Currently Amarok will not import tracks with duplicate unique ids. This leads to the surprising behavior that copied tracks still appear only once in Amarok.

In some circumstances even different tracks can end up with the same unique id. Such a problem can be seen by the debug output (start Amarok with the --debug option in a console) while scanning.

About Albums

The scanner can only read single tracks but Amarok will display those sorted by album and compilation (an album without one specific artist). Amarok can't rely on the directory in which the files are located, since directory organizational schemes vary so widely.

The scanner is therefore doing the following:

  • Tracks without an album artist or an artist (or a composer in case of a classical track) are placed in a compilation.
  • Tracks that have the compilation flag set or an album artist other than "various artists" will be placed in an album.
  • Tracks that have the compilation flag set to 0 are placed in a compilation.
  • Albums called "Best Of", "Anthology", "Hit collection", "Greatest Hits", "All Time Greatest Hits" and "Live" are always regarded as an album.
  • If we end up having tracks with several different artists left over they are placed inside a compilation, or else we make one album out of them.

This process is quite complicated. However usually the outputs of the scanner can help in figuring out why the tracks are sorted as they are.

In such a case try executing (on a command line)

amarokcollectionscanner -r ~/your/music/directory

Look for "compilation" tags and tracks with different "artist" and "albumartist" tags.

You can remove the compilation tag from mp3 files with the following command:

id3v2 -r TCMP your filename here