An accessor to an inverted file. More...
#include <CAcIFFileSystem.h>
Public Member Functions | |
bool | operator() () const |
for testing if the inverted file is correctly constructed | |
CAcIFFileSystem (const CXMLElement &inCollectionElement) | |
This opens an exsisting inverted file, and then inits this structure. | |
bool | init (bool) |
called by constructors | |
~CAcIFFileSystem () | |
Destructor. | |
string | IDToURL (TID inID) const |
Translate a DocumentID to a URL (for output) | |
virtual pair< bool, TID > | URLToID (const string &inURL) const |
Translate an URL to its document ID. | |
void | getAllIDs (list< TID > &) const |
List of the IDs of all documents present in the inverted file. | |
void | getAllAccessorElements (list< CAccessorElement > &) const |
List of triplets (ID,imageURL,thumbnailURL) of all the documents present in the inverted file. | |
void | getRandomIDs (list< TID > &, list< TID >::size_type) const |
get a given number of random C-AccessorElement-s | |
void | getRandomAccessorElements (list< CAccessorElement > &outResult, list< CAccessorElement >::size_type inSize) const |
For drawing random sets. | |
int | size () const |
The number of images in this accessor. | |
TID | getMaximumFeatureID () const |
This is interesting for browsing. | |
list< TID > * | getAllFeatureIDs () const |
Getting a list of all features contained in this. | |
virtual pair< bool, CAccessorElement > | IDToAccessorElement (TID inID) const |
Translate a DocumentID to an accessor Element. | |
operator bool () const | |
is this well constructed? | |
The proper inverted file access | |
CDocumentFrequencyList * | FeatureToList (TFeatureID) const |
List of documents containing the feature. | |
CDocumentFrequencyList * | URLToFeatureList (string inURL) const |
List of features contained by a document. | |
CDocumentFrequencyList * | DIDToFeatureList (TID inDID) const |
List of features contained by a document with ID inDID. | |
Accessing information about features | |
double | FeatureToCollectionFrequency (TFeatureID) const |
Collection frequency for a given feature. | |
unsigned int | getFeatureDescription (TID inFeatureID) const |
What kind of feature is the feature with ID inFeatureID? | |
Accessing additional document information | |
double | DIDToMaxDocumentFrequency (TID) const |
returns the maximum document frequency for one document ID | |
double | DIDToDFSquareSum (TID) const |
Returns the document-frequency square sum for a given document ID. | |
double | DIDToSquareDFLogICFSum (TID) const |
Returns this function for a given document ID. | |
bool | generateInvertedFile () |
Generating an inverted File, if there is none. | |
bool | newGenerateInvertedFile () |
Generating an inverted File, if there is none. | |
bool | checkConsistency () |
Check the consistency of the inverted file system accessed by this accessor. | |
bool | findWithinStream (TID inFeatureID, TID inDocumentID, double inDocumentFrequency) const |
Is the Document with inDocumentID contained in the document frequency list of the feature inFeatureID and is the associated document frequency the same? | |
![]() | |
CAcInvertedFile (const CXMLElement &inCollectionElement) | |
This opens an exsisting inverted file, and then inits this structure. | |
~CAcInvertedFile () | |
Destructor. | |
![]() | |
const string & | getURLToFeatureFileName () const |
gives back the content of mURLToFeatureFileName | |
CAcURL2FTS (const CXMLElement &inContentElement) | |
Constructor: slurp in an url2fts file and fill the maps. | |
pair< bool, string > | URLToFFN (const string &inURL) const |
gives the feature file name which corresponds to a given URL return value: pair of bool (does the feature file exsist) string (the feature file name) | |
pair< bool, string > | IDToFFN (TID inID) const |
gives the feature file name which corresponds to a given URL return value: pair of bool (does the feature file exsist) string (the feature file name) | |
![]() | |
virtual | ~CAccessor () |
virtual accessor for clean destruction | |
virtual CXMLElement * | prepareDatabase () |
If a new collection is created during runtime, this function prepares the indexing structures such that they are able to accept new objects. | |
virtual bool | isPreparedDatabase () const |
Is the database accessed by this accessor prepared? In other words: is there an index structure to access? |
Protected Types | |
typedef HASH_MAP< TID, streampos > | CIDToOffset |
map from feature id to the offset for this feature | |
![]() | |
typedef hash_map< TID, unsigned int > | CIDToOffset |
map from feature id to the offset for this feature |
Protected Member Functions | |
void | writeOffsetFileElement (TID inFeatureID, streampos inPosition, ostream &inOpenOffsetFile) |
add a pair of FeatureID,Offset to the open offset file (helper function for inverted file construction) | |
CDocumentFrequencyList * | getFeatureFile (string inFileName) const |
loads a *.fts file. | |
![]() | |
void | writeOffsetFileElement (TID inFeatureID, int inPosition, ostream &inOpenOffsetFile) |
add a pair of FeatureID,Offset to the open offset file (helper function for inverted file construction) | |
![]() | |
virtual void | dummy () const |
without this function things like upcasting etc. |
Protected Attributes | |
CMutex | mMutex |
the mutex for multi threading | |
CSelfDestroyPointer< CAcURL2FTS > | mURL2FTS |
In order to have just one parent, I have to limit on single inheritance. | |
TID | mMaximumFeatureID |
the maximum feature ID arising in this file | |
string | mInvertedFileBuffer |
A buffer, if the inverted file is to be held in ram. | |
string | mTemporaryIndexingFileBase |
Some place for putting temporary indexing data. | |
CSelfDestroyPointer< istream > | mInvertedFile |
The inverted file. | |
ifstream | mOffsetFile |
Feature -> Offset in inverted file. | |
ifstream | mFeatureDescriptionFile |
File of feature descriptions. | |
string | mInvertedFileName |
Name of the inverted file. | |
string | mOffsetFileName |
Name of the Offset file. | |
string | mFeatureDescriptionFileName |
Name for the file with the feature description. | |
CIDToOffset | mIDToOffset |
map from feature id to the offset for this feature | |
HASH_MAP< TID, double > | mFeatureToCollectionFrequency |
map from feature to the collection frequency | |
for fast access... | |
HASH_MAP< TID, unsigned int > | mFeatureDescription |
map from the feature ID to the feature description | |
CADIHash | mDocumentInformation |
additional information about the document like, e.g. | |
![]() | |
TID | mMaximumFeatureID |
the maximum feature ID arising in this file | |
CArraySelfDestroyPointer< char > | mInvertedFileBuffer |
A buffer, if the inverted file is to be held in ram. | |
CSelfDestroyPointer< istream > | mInvertedFile |
The inverted file. | |
ifstream | mOffsetFile |
Feature -> Offset in inverted file. | |
ifstream | mFeatureDescriptionFile |
File of feature descriptions. | |
string | mInvertedFileName |
Name of the inverted file. | |
string | mOffsetFileName |
Name of the Offset file. | |
string | mFeatureDescriptionFileName |
Name for the file with the feature description. | |
CIDToOffset | mIDToOffset |
map from feature id to the offset for this feature | |
hash_map< TID, double > | mFeatureToCollectionFrequency |
map from feature to the collection frequency | |
hash_map< TID, unsigned int > | mFeatureDescription |
map from the feature ID to the feature description | |
CADIHash | mDocumentInformation |
additional information about the document like, e.g. | |
![]() | |
TID | mID |
the ID of the next element | |
string | mURLPrefix |
the url-prefix for the image list | |
string | mThumbnailURLPrefix |
the thumbnail-url-prefix for the image list | |
CMutex | mMutexURL2FTS |
the mutex for multithreading the name is intended to be unique and immune against inheritance... | |
string_string_map | mURLToFFN |
map from the url of an image to the name of the feature file for this image | |
TID_string_map | mIDToFFN |
map from the id of an image to the name of the feature file for this image | |
ifstream | mURLToFeatureFile |
URL -> FeatureFileName. | |
string | mURLToFeatureFileName |
Name of the file that contains pairs of URL and the Feature file that belongs to the URL. | |
![]() | |
string_TID_map | mURLToID |
map the url of an image to the id of this image | |
TID_CAccessorElement_map | mIDToAccessorElement |
maps the ID of an image to the URL of this image |
An accessor to an inverted file.
This access is done "by hand".
For a long time we wanted to move to memory mapped files (like SWISH++) but currently I think this is not the best idea.
CAcIFFileSystem::CAcIFFileSystem | ( | const CXMLElement & | inCollectionElement | ) |
This opens an exsisting inverted file, and then inits this structure.
After that it is fully usable
As a paramter it takes an XMLElement which contains a "collection" element and its content.
If the attribute cui-generate-inverted-file is true, then a new inverted file will be generated using the parameters given in inCollectionElement. you will NOT be able to use *this afterwards.
Like every accessor, this accessor takes a <collection> MRML element as input (
cui-base-dir: the directory containing the following files cui-inverted-file-location: the location of the inverted file cui-offset-file-location: a file containing offsets into the inverted file cui-feature-file-location: the location of the "url2fts" file which translates urls to feature file names.
|
virtual |
Check the consistency of the inverted file system accessed by this accessor.
Implements CAcInvertedFile.
bool CAcIFFileSystem::findWithinStream | ( | TID | inFeatureID, |
TID | inDocumentID, | ||
double | inDocumentFrequency | ||
) | const |
Is the Document with inDocumentID contained in the document frequency list of the feature inFeatureID and is the associated document frequency the same?
inFeature<id | the |
Reimplemented from CAcInvertedFile.
|
virtual |
Generating an inverted File, if there is none.
Fast but stupid in-memory method. This method is very fast, if all the inverted file (and a bit more) can be kept in memory at runtime. If this is not the case, extensive swapping is the result, virtually halting the inverted file creation.
Implements CAcInvertedFile.
|
virtual |
Getting a list of all features contained in this.
This function is necessary, because in the present system only about 50 percent of the features are really used.
A feature is considered used if it arises in mIDToOffset.
Implements CAcInvertedFile.
|
protected |
|
virtual |
For drawing random sets.
Why is this part of an CAccessorImplementation? The way the accessor is organised might influence the way random sets can be drawn. At present everything happens in RAM, but we do not want to be fixed on that.
inoutResultList | the list which will contain the result |
inSize | the desired size of the inoutResultList |
Implements CAccessor.
|
virtual |
get a given number of random C-AccessorElement-s
inoutResultList | the list which will contain the result |
inSize | the desired size of the inoutResultList |
Implements CAccessor.
|
virtual |
Translate a DocumentID to an accessor Element.
Implements CAccessor.
bool CAcIFFileSystem::newGenerateInvertedFile | ( | ) |
Generating an inverted File, if there is none.
Employing the two-way-merge method described in "managing gigabytes", chapter 5.2. Sort-based inversion. (Page 181)
Reimplemented from CAcInvertedFile.
|
virtual |
Translate an URL to its document ID.
Implements CAcInvertedFile.
|
protected |
additional information about the document like, e.g.
the euclidean length of the feature list.
|
protected |
In order to have just one parent, I have to limit on single inheritance.
I cannot use virtual base classes, because then I cannot downcast