I wrote out a quick nifty STL-like helper class to accomplish the same behavior, but much faster. The class definition looks like this (note that I used
unsigned
instead of defining my own size_type
for brevity):template < typename T >
class unique_vector
{
typedef std::vector< const T* > Tvector;
typedef std::map< T, unsigned > Tmap;
Tmap m_map;
mutable Tvector m_vector;
mutable bool m_dirty;
void m_build() const;
public:
unique_vector();
unsigned size() const;
unsigned add( const T& );
const T& operator [] ( unsigned ) const;
};
This stitches together a map (to achieve uniqueness and provide faster lookup time) and a vector to provide index-based lookups. This is mainly achieved through two functions. Let's take a look at the
add()
function:template < typename T >
inline unsigned unique_vector<T>::add( const T& t )
{
unsigned index = m_map.size();
std::pair< Tmap::iterator, bool > result =
m_map.insert( std::make_pair( t, index ) );
m_dirty = true;
return (*result.first).second;
}
This function relies on the fact that
std::map::insert()
fails if the key (t
in this case) is already present in the map, but returns an iterator to the existing item. This function basically tries to add the item to the map with the next index (using m_map.size()
as the next index so the first item will have index 0, next will have index 1 and so on). If it's not there, insert()
will succeed and return an iterator to the new record (using the provided index). If it's already there, insert()
fails but returns the existing item. The function returns the index associated with the record, whether it's the new index from a freshly inserted item or the old index from a pre-existing item.Notice that the
m_dirty
flag is set to true. This indicates that the vector needs to be rebuilt. The operator[]()
calls m_build()
if m_dirty
is set to true:template < typename T >
inline void unique_vector<T>::m_build() const
{
m_vector.resize( m_map.size() );
for ( Tmap::const_iterator iter( m_map.begin() );
iter != m_map.end();
++iter )
{
m_vector[ (*iter).second ] = &(*iter).first;
}
m_dirty = false;
}
This function quickly builds the vector using the unique indexes stored in the map as array indexes and pointers to the objects used as map keys. The function is const (and data members mutable) to allow it to be called from the const
operator[]()
function.So, how much of an improvement was it? The tool uses complex structures as
T
for this template, but I ran a simple test using 10,000 pre-allocated std::string
objects and 100,000 attempted adds. On my AMD 64 4000+ with 2GB RAM, the linear version took 5,077.160 ms (5 seconds). The unique_vector
ran the same test in 49.972 ms. Algorithm changes almost always trump when optimizing.P.S. In case you were wondering, here are the remaining very simple functions:
template < typename T >
inline unique_vector<T>::unique_vector() :
m_dirty( false )
{}
template < typename T >
inline unsigned unique_vector<T>::size() const
{
return m_map.size();
}
template < typename T >
inline const T& unique_vector<T>::operator [] ( unsigned u ) const
{
if ( m_dirty ) m_build();
return *m_vector[ u ];
}