Foist off, a little intro. I played with DOS batch files back in the early 80s so I'm not only out of practice but I'm switching script language from batch to bash. Yeah, this is my first foray into the world of bash after using Linux for just shy of 15 years.
I'm downloading, merging and organizing 11 hosts files from online sources into one. I'm sudo cat'ing these three files to write over the /etc/hosts: I have a copy of my original hosts header with several of my own additions for blocking, then I make a current date and time file prepended to that so I know when I've updated the hosts file. Then 11 source files from online are cat'ed and cleaned up. That assemblage is then my new hosts file. I've added a test section to determine if a copy of the original hosts file exists and if it doesn't, one is made. Can't be too careful! Just call me Sir Blockalot.
I’ve run into a problem I just can’t figure out. Using ‘sort --unique’ I end up with a tiny handful of duplicate lines, always pairs, no more than two of each duplicate. Naturally when all those files are downloaded there are many-several duplicates and many of them are more than just double copies. Not only is this the first time I’ve tackled bash, but also a large mix of files from the GSW - Great Spider Web.
I'm assuming, since they're downloaded, that the problem is caused by either MS or HTML formatting? If I'm right, how can I figure it out and then fix it? It's either that or the size of the finished file, 565,000 lines give or take. I don't think it's the size, but I'll toss that out as a possibility.
Here's what one pair looks like.
0.0.0.0 zhirok.com #[Spamdexing]
0.0.0.0 zhirok.com #[Spamdexing]
I update the hosts files on my old laptop, my wife's laptop and her 2in1 monthly and I decided to automate the process and just for spits and gurgles I decided to add all those other online hosts files. I've made the script file mostly generic, using "${USER}" in the file paths so I can toss it onto all of our PCs without any editing of the script.
BTW, did you ever notice that there are two large industries where the customer is referred to as a user?
Drugs and software.
For all the more duplicates there are, it's not mission critical. I could easily just ignore them, but it bugs me, ya know. I'm a prefectionist - a poorfuctionist - I'd just like it tidy.
This isn't critical to my question, but I'm using LMDE4.