Haven't been here for awhile, Thought I would release a boyermore implementation search in Delphi, super fast.. just change the Chars to byte and your set or modify it for wild cards by changing
Orginal(Find this)
WildCard(Replace With)Code:While pTarget[i] = pPattern[j] Do
^ Ofcause "*" can be any char or byte you want...Code:While (pTarget[i] = pPattern[j]) or (pPattern[j] = '*') Do
BoyerMore Search Pattern:
Code:(* ByerMore Search Pattern Coded By Departure *) Function SearchPattern(Const pTarget, pPattern: PChar): DWORD; Var i :Integer; j :Integer; k :Integer; iTargetLen :Integer; iPatternLen :Integer; baStep: Array[0..255] Of Byte; Begin Result := 0; iTargetLen := Length(pTarget); iPatternLen := Length(pPattern); If iTargetLen * iPatternLen = 0 Then Exit; FillChar(baStep,SizeOf(baStep), iPatternLen); For K := 0 to iPatternLen -1 do baStep[Ord(pPattern[k])]:= iPatternLen - K; While K <= iTargetLen Do Begin i := k - 1; j := iPatternLen - 1; While pTarget[i] = pPattern[j] Do Begin Dec(i); Dec(j); End; If j = -1 Then Begin Result := i + 2; Exit; End; Inc(K, baStep[Ord(pTarget[k])]); End; End;
God job Departure
[IMG]https://especiais.lancene*****m.br/santos-campeao-libertadores-2011/images/escudo-santos.png[/IMG]
I decided to port this over to C++. Not sure if it is 100% correct, due to some of the Delphi functions you used (Ord, etc.)
How does it look?Code:#include <cstring> unsigned long SearchPattern(const char* pTarget,const char* pPattern) { int i,j,k,iTargetLen,iPatternLen; unsigned char baStep[255]; unsigned long result = 0; iTargetLen = strlen(pTarget); iPatternLen = strlen(pPattern); if( iTargetLen*iPatternLen == 0) return result; for(int i = 0 ; i < iPatternLen; i++) baStep[i] = sizeof(baStep); for(k=0; k<iPatternLen; k++) baStep[(int)pPattern[k]] = iPatternLen - k; while(k<=iTargetLen){ i=k-1; j=iPatternLen-1; while((pTarget[i] == pPattern[j]) || (pPattern[j] == '*')){ i--; j--; } if(j=(-1)) { result = i+2; return result; } k+=baStep[((int)pTarget[k])]; } }
@Departure
@Saltine
Nice work mate, does it work have you tested it?
Ord function is the "Ordinal" value of a char, byte ect... yes I am pretty sure typecasting it as "Int" in C++ should do the trick also, for example if we had a byte $31 which is the character "1" in ascii, its Ordinal value would be 49 in decimal.
an example using Wildcard as you have done. in delphi it would be
iPos:= SearchPattern('hello this is a simple test of how to search in a string', '*ow');
Writeln('Index: ', iPos); //returns 32 with 'how'
I wrote this a few months ago...
I never got around to testing it. If you test it, let me know if it works.Code:#define realAddy( cast, base, offset ) (cast)((DWORD)(base) + (DWORD)(offset)) bool bExist(BYTE part, BYTE *whole, char *mask) { BYTE *start = (BYTE*)(whole + sizeof(whole) - 1); for(DWORD loc = 0; loc < sizeof(whole); loc++) if((*(BYTE*)(start - loc) == part) || (*(mask + (strlen(mask) - 1) - loc) == '?')) return true; return false; } DWORD lastLocationOfByte(BYTE part, BYTE *whole, char *mask) { BYTE *start = (BYTE*)(whole + sizeof(whole) - 1); for(DWORD loc = 0; loc < sizeof(whole); loc++) if((*(BYTE*)(start - loc) == part) || (*(mask + (strlen(mask) - 1) - loc) == '?')) return (sizeof(whole) - loc); } DWORD FindPattern(DWORD first, DWORD total, BYTE *val, char *mask) { size_t next = sizeof(*val); BYTE *addy = (BYTE*)((val + next) - 1); bool exist = false; BYTE *loc; for(DWORD n = (next - 1); n < total;) { DWORD spot; for(spot = 0; (*(unsigned char*)(first + n - spot) == *(unsigned char*)(addy - spot)) || (*(mask + (strlen(mask) - 1) - spot) == '?'); spot++); if(spot == next) return realAddy( DWORD, first, n ); else if(bExist(*(BYTE*)(first + n), val, mask)) n += lastLocationOfByte(*(BYTE*)(first + n), val, mask); else n += next; } return false; }
@yodaliketaco
Looks like Naive search algorithm, this type of search routine is good for small patterns but not as fast as a boyer-more algorithm. The idea in boyer-more algo is the stepping code which calculate how many places to step so not every single byte needs to be "checked". While the naive algo could scan 2,400 bytes per millisecond the boyer-more reached upto 8,600 bytes per millisecond based on some testing with a 36mb file, Also the longer the pattern the faster the boyer-more algo was. If I had C++ compiler I would test out your code but im not a c++ programmer so I don't have any compilers installed.
I coded this based on the boyer-moore algorithm. It's probably not the most concise version, but if it works it should still be much more efficient than the Naive search algorithm. I don't have a base working atm, but if I get back around to coding for this game I will probably test and revise this a bit.
Sorry to bump the thread but as I was reading through here I noticed that @Saltine said he messed up translating the procedure to C(++), I don't know if he's fixed it or not (I'd assume so) but I went ahead and took the liberty to translate it.
No idea why the forum ***s the _t-c-slen function but oh well, that works as expected, at least given Departure's test case that he had Saltine try it against.Code:DWORD SearchPattern(LPCWSTR szTarget, LPCWSTR szPattern) { int i, j, k; int iTargetLen, iPatternLen; char baStep[255]; iTargetLen = _***len(szTarget); // (The *** is t-c-s without the -s, no fucking idea why it's *'d out... iPatternLen = _***len(szPattern); // Same here. if(iTargetLen * iPatternLen == 0) return -1; memset(baStep, iPatternLen, sizeof(baStep)); for(k = 0; k < iPatternLen; k++) { baStep[(int)szPattern[k]] = iPatternLen - k; } while(k <= iTargetLen) { i = k - 1; j = iPatternLen - 1; while(szTarget[i] == szPattern[j] || szPattern[j] == '*') { i--; j--; } if(j == -1) return i + 2; k += baStep[(int)(szTarget[k])]; } }