Oulipo Time Limit: 5000ms, Special Time Limit:12500ms, Memory Limit:65536KB Problem description The French author Georges Perec (1938-1982) once wrote a book, La disparition, without the letter ‘e’. He was a member of Oulipo group. A quote from the book:Tout avait l’air normal, mais tout s’affirmait faux. Tout avait l’air normal, d’abord, puis surgissait l’inhumain, l’affolant.Ⅱaurait voulu savoir ou s, articulait l’association qui l’unissait au roman: sur son tapis, assailant atout instant son imagination, l’intution d’un tabou, la vision d’un mal obscure, d’sun quoi vacant, d’un non-dit: la vision, l’avision d’un oubli commandant tout,ou s’abolissait la raison: tout avait l’air normal mais …Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given ”word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive ‘T’s is not unusual. And they never use spaces.So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {‘A’,’B’,’C’, …,’Z’} and two finite strings over that alphabet, a word W and a text T ,count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap. Input The first line of the input file contains a single number: the number of the cases to follow. Each test case has following format:One line with the word W, a string over {‘A’,’B’,’C’, …,’Z’}, with 1<=|W|<=10,000(here |W| denotes the length of the string W).One line with text T, a string over {‘A’,’B’,’C’, …,’Z’}, with |W|<=|T|<=1,000,000. Output For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T. Sample Input 3 BAPC BAPC AZA AZAZAZA VERDI AVERDXIVYERDIAN Sample Output 1 3 0 /// 解题报告本题意思很清楚,就是在一个串中找出某个子串的个数,比如AZAAZAZAZA 显然有包含了三个AZA,起始下标分别为(0,2,4)本题数据量比较大,很多人都超时,开始我也超时,过了段时间再看此题才发现现在这个算法。普通的KMP匹配就不说了,主要是匹配之后,主串索引和子串索引的变化,一般人都认为是主串回到和子串第二个位置匹配的位置,而子串索引则变为0。比如:AZAAZAZAZA 的匹配过程 j(子串的)=0,i(主串)=0,到j=2,i=2匹配完了一般的作法是重设 j=0,i=1,继续匹配……本题的较好的做法是:在子串的末尾虚设一字符(并不需要真的去设置只是我们逻辑上认为存在这样一个字符)该字符不和任何字符匹配,即当我们在主串中匹配完后,下一个字符会不匹配,于是我们按通常的匹配算法主串的索引不变,子串索引从 next 表中取,要获取虚设字符的 next 值其实就是在字串求完后,再多求一位,这样主串只需扫描一次即可。不过还是发了1078ms,不知道他们那些XXms的是怎样做的,以后搞到代码了再贴……//mycode as followed:#include <stdio.h>#include <string.h>char gStr[1000002],gDest[10002];int gNext[10002];// 获取KMP匹配时串 str 的 next 表,另外求了虚设字符的 next 值void getNext(char str[],int next[]){ unsigned int i,j; for(next[0]=j=-1,i=0; i<=strlen(str);){ if(j==-1||str[i]==str[j]){ ++i; ++j; next[i]=j; } else j=next[j]; } }// 获取gDest在gStr中的出现次数int getCount(){ int i, j, count; for(count=i=j=0; gStr[i];){ if(j==-1||gStr[i]==gDest[j]){ ++i; ++j; if(gDest[j]=='\0'){ count++; j=gNext[j]; // 使用虚设字符的 next 值 } } else j=gNext[j]; } return count;}int main(){ int n; scanf("%d",&n); while(n--){ scanf("%s %s",&gDest,&gStr); getNext(gDest,gNext); printf("%d\n",getCount()); } return 0;}//// 这样就优化了,速度竟快了十几倍 !!!#include <stdio.h>#include <string.h>char gStr[1000002],gDest[10002];int gNext[10002];void getNext(){ int i,j,len=(int)strlen(gDest);; for(gNext[0]=j=-1,i=0; i<=len;){ if(j==-1||gDest[i]==gDest[j]){ ++i; ++j; gNext[i]=j; } else j=gNext[j]; } }int getCount(){ int i, j, count; for(count=i=j=0; gStr[i];){ if(j==-1||gStr[i]==gDest[j]){ ++i;++j; if(gDest[j]=='\0'){ count++; j=gNext[j]; } } else j=gNext[j]; } return count;}int main(){ int n; scanf("%d",&n); getchar(); while(n--){ gets(gDest); gets(gStr); getNext(); printf("%d\n",getCount()); } return 0;}

评论