Thanks for the numbers.
Lets assume that each user takes up 1K, or at least that much when all the overhead of there record is included.
my $never_logged_in = 3_833;
my $logged_in_no_nodes = 9_668;
my $aka_bad_users = $never_logged_in + $logged_in_no_nodes;
my $disk_bytes = 1_024 * $aka_bad_users;
print "Extra records to wade through = $aka_bad_users\n";
print "Wasted disk space = $disk_bytes\n";
It appears to be around 13MB of space wasted by having "bad users" on file, I don't find that number large enough to introduce any headaches at this time, but I think it does deserve some thought on how to deal with these records in the future. Even if the removal was done for just the space considerations on the HD I don't think it would generate any great savings. There is also the option of using some form of compressed table format if we don't already, this would make all of the records more space efficent. One future plan might be to just add a column to the user table such as 'is_active' as a key so that searches could be done using it in the where critera where only "active" users are important to the results. A user could be considered inactive until the first login and then be switched active. Inactivity could be triggered by no login for over X months.
I have only dug around slightly in the Everything engine and I think removing node association may be a risky if not deadly exercise.
A properly indexed database table can side step the performance impact of numerous "extra" entries in a database.